Use of endonucleases for inserting transgenes into safe harbor loci

ABSTRACT

The present invention concerns the endonucleases capable of cleaving a target sequence located in a “safe harbor loci”, i.e. a loci allowing safe expression of a transgene. The present invention further concerns the use of such endonucleases for inserting transgenes into a cell, tissue or individual.

The present invention concerns the endonucleases capable of cleaving atarget sequence located in a “safe harbor loci”, i.e. a loci allowingsafe expression of a transgene. The present invention further concernsthe use of such endonucleases for inserting transgenes into a cell,tissue or organism.

Meganucleases

Meganucleases, also referred to as homing endonucleases, were the firstendonucleases used to induce double-strand breaks and recombination inliving cells (Rouet et al. PNAS 1994 91:6064-6068; Rouet et al. Mol CellBiol. 1994 14:8096-8106; Choulika et al. Mol Cell Biol. 199515:1968-1973; Puchta et al. PNAS 1996 93:5055-5060). However, their usehas long been limited by their narrow specificity. Although severalhundred natural meganucleases had been identified over the past years,this diversity was still largely insufficient to address genomecomplexity, and the probability of finding a meganuclease cleavage sitewithin a gene of interest is still extremely low. These findingshighlighted the need for artificial endonucleases with tailoredspecificities, cleaving chosen sequences with the same selectivity asnatural endonucleases.

Meganucleases have emerged as scaffolds of choice for deriving genomeengineering tools cutting a desired target sequence (Paques et al. CurrGen Ther. 2007 7:49-66). Combinatorial assembly processes allowing toengineer meganucleases with modified specificities has been described byArnould et al. J Mol. Biol. 2006 355:443-458; Arnould et al. J Mol.Biol. 2007 371:49-65; Smith et al. NAR 2006 34:e149; Grizot et al. NAR2009 37:5405). Briefly, these processes rely on the identifications oflocally engineered variants with a substrate specificity that differsfrom the substrate specificity of the wild-type meganuclease by only afew nucleotides. Up to four sets of mutations identified in suchproteins can then be assembled in new proteins in order to generate newmeganucleases with entirely redesigned binding interface.

These processes require two steps, wherein different sets of mutationsare first assembled into homodimeric variants cleaving palindromictargets. Two homodimers can then be co-expressed in order to generateheterodimeric meganucleases cleaving the chosen non palindromic target.The first step of this process remains the most challenging one, and onecannot know in advance whether a meganuclease cleaving a given locuscould be obtained with absolute certainty. Indeed, not all sequences areequally likely to be cleaved by engineered meganucleases, and in certaincases, meganuclease engineering could prove difficult (Galetto et al.Expert Opin Biol Ther. 2009 9:1289-303).

Other Enzymes Suitable for Site-Specific Genome Modifications

Specialized enzymes like integrases, recombinases, transposases andendonucleases have been proposed for site-specific genome modifications.For years, the use of these enzymes remained limited, due to thechallenge of retargeting their natural specificities towards desiredtarget sites. Indeed, the target sites of these proteins, or sequenceswith a sufficient degree of sequence identity, should be present in thesequences neighboring the mutations to be corrected, or within the geneto be inactivated, which is usually not the case, except in the case ofpre-engineered sequences. The main challenge that would allow the use ofthese DNA modifying enzymes in gene therapy relies on the possibility ofredesigning their DNA binding properties. Many strategies have beendeveloped, aiming to obtain artificial proteins with tailored substratespecificities,

The integrase from the Streptomyces phage PhiC31 was used early fortargeted gene transfer in an endogenous locus. This enzyme mediatesrecombination of the phage genome into the bacterial chromosome througha site-specific reaction between the phage attachment site (attP) andthe bacterial attachment site (attB) (Kuhstoss et al. J Mol Biol 1991222:897-908; Rausch et al. NAR 1991 19:5187-5189). This can occur fromplasmids carrying attB sites into native genomic sequences harboringpartial identity with attP, called pseudo attP sites (attP′). The PhiC31integrase has been used to transfer several transgenes, including hFIX,in the human genome (Olivares et al. Nat Biotech 2002 20:1124-1128;Ginsburg et al. Adv Genet. 2005 54:179-187; Calos Curr Gene Ther 20066:633-645; Chalberg et al. J Mol Biol 2006 357:28-48; Aneja et al. JGene Med 2007 9:967-975). The drawback here is that the site whereintegration can occur cannot be chosen (Chalberg et al. J Mol Biol 2006357:28-48), and one has to rely on pseudo attP sites within the humangenome loci, for precise integration. Whereas a major integration siteis found on chromosome 19, hundreds other integration loci have beenidentified (Chalberg et al. J Mol Biol 2006 357:28-48). In recent work,the PhiC31 integrase was mutated in order to increase efficiency andspecificity for integration at an attP′ site, paving the way for thedevelopment of engineered integrases that target chosen sites (Keravalaet al. Mol Ther 2009 17:112-120). However, development of engineeredintegrases has lagged behind similar efforts focused on targetedrecombinase and endonuclease systems.

Site-specific recombinases, such as the Cre recombinase frombacteriophage P1, or the Flp protein from Saccharomyces cerevisiae havebeen used to induce recombination between pre-engineered sequencescontaining their cognate sites. The Cre recombinase recognizes andmediates recombination between two identical 34 bp sites known as loxP(Abremski et al. Cell 1983 32:1301-1311). For many years, a limitationof Cre derived recombinases has been that repeated loxP, or pseudo loxPsites, must be present in order to allow DNA integration between thesetwo sites. However, directed evolution of the DNA binding interface ofthis molecule has been used to create recombinases with newspecificities (Buchholz et al. Nat Biotech 2001 19:1047-1052; Santoro etal. PNAS 2002 99:4185-4190). The Cre recombinase system has also beenuseful in providing a framework for the use of DNA targeting enzymes toinduce the excision of viral sequences. Indeed, work with a retroviralMoloney murine leukemia virus vector system has shown that, when loxPsites are introduced in the LTR of an integrative retroviral vector, theexpression of Cre can result in the deletion of all the sequencesbetween the two loxP sites (Choulika et al. J Virol 1996 70:1792-1798).More recently, an engineered Cre recombinase variant has been used toexcise an HIV type 1 provirus (Sarkar et al. Science 2007 316:1912-1915)from cells. The recombinase was redesigned to target the proviral LTRs,and used to induce the excision of all intervening sequences.Engineering attempts have also been made with the Flp recombinase,targeting the FRT (Flp Recombination Target) sequence (Buchholzt et al.Nat Biotech 1998 16:657-662), and variants recognizing non-native Flprecombination targets have been obtained (Voziyanov et al. J Mol Biol2003 326:65-76). However, there is no example of targeted insertion in anon-pre-engineered locus with such enzymes today.

Transposons such as Piggy Back and Sleeping Beauty can provide efficienttools for insertion of sequences into vertebrate cells and have beenproposed as an alternative to viral mediated gene delivery to achievelong-lasting expression (Izsvak et al. Mol ther 2004 9:147-156; Ivics etal. Curr Gene Ther 2006 6:593-607; Mates et al. Nat Genet. 200941:753-761). Transposons are a natural means of gene delivery in which aDNA sequence present in a DNA molecule is inserted in another location,through the action of the transposase. An engineered SB transposase,called SB100X was recently shown to increase the efficiency of theprocess (Mates et al. Nat Genet. 2009 41:753-761). Transposition israndom on a genomic level (for example, SB integrates into TAdinucleotides (Vigdal et al. J Mol Biol 2002 323:441-452), and shouldtherefore not be considered as tools for targeted approaches. However,further work has shown the possibility of chromosomal transpositionmediated by engineered transposases in human cells, by fusing thetransposase catalytic domain to specific DNA binding domains (Ivics, etal. Mol Ther 2007 15:1137-1144), paving the way for the development of anew category of targeted tools.

Gene Therapy

The successful treatment of several X-SCID patients by gene therapynearly 10 years ago was one of the most significant milestones in thefield of gene therapy. This tremendous achievement was followed bysignificant success in other clinical trials addressing differentdiseases, including another form of SCID, Epidermolysis Bullosa andLeber Amaurosis and others. However, these initial successes have longbeen overshadowed by a series of serious adverse events, i.e. theappearance of leukemia in X-SCID treated patients (Hacein-Bey-Abina etal. Science 2003 302:415-419; Hacein-Bey-Abina et al. J Clin Invest.2008 118:3132-3142; Howe et al. J Clin Invest. 2008 118:3143-3150). Allcases of leukemia, but one, could eventually be treated by chemotherapy,and the approach appears globally as a success, but these seriousadverse effects highlighted the major risks of current gene therapyapproaches.

There is thus a need in the art for a safe method for inserting a geneinto the genome of a subject.

Most of the gene therapy protocols that are being developed these daysfor the treatment of inherited diseases are based on the complementationof a variant allele by an additional and functional copy of thedisease-causing gene. In non-dividing tissues, such as retina,delivering this copy can be accomplished using a non integrative vector,derived for example, from an Adeno Associated Virus (AAV). However, whentargeting stem cells, such as hematopoietic stem cells (HSCs), whosefate is to proliferate, persistent expression becomes an issue, andthere is a need for integrative vectors. Retroviral vectors, whichintegrate in the genome and replicate with the hosts' chromosomes, haveproved efficient for this purpose, but the random nature of theirinsertion has raised various concerns, all linked with gene expression.The cases of leukemia observed in the X-SCID trials were clearly linkedto the activation of a proto-oncogene in the vicinity of the integrationsites. In addition, inappropriate expression of the transgene couldresult in metabolic or immunological problems. Finally, insertion couldresult in the knock-out of endogenous genes.

Site-specific integration would be a promising alternative to randomintegration of viral vectors since it could alleviate the risks ofinsertional mutagenesis (Kolb et al. Trends Biotechnol. 2005 23:399-406;Porteus et al. Nat. Biotechnol. 2005 23:967-973; Paques et al. Curr GenTher. 2007 7:49-66). However, it is relatively tedious to engineer toolsfor targeted recombination. In addition, each tool has its intrinsicproperties in terms of activity and specificity.

Therefore, there is a need in the art for a tool allowing the targetedinsertion of transgenes into loci of the genome that can be consideredas “safe harbors” for gene addition. In addition, it would be extremelyadvantageous if this tool could be used for inserting transgenesirrespective of their sequences, thereby allowing the treatment ofnumerous diseases by gene therapy using a same tool. Moreover, it wouldbe extremely advantageous if this tool allowed inserting transgenes intothe genome with a high efficacy, and led to stable expression of thetransgene at high levels.

SUMMARY OF THE INVENTION

The invention is notably drawn to the following embodiments:

Embodiment 1

A variant endonuclease capable of cleaving a target sequence for use ininserting a transgene into the genome of an individual, wherein

-   -   i. said genome comprises a locus comprising said target        sequence; and    -   ii. said target sequence is located at a distance of at most 200        kb from a retroviral insertion site (RIS), wherein said RIS is        neither associated with cancer nor with abnormal cell        proliferation.

Embodiment 2

The endonuclease according to embodiment 1, wherein insertion of saidtransgene does not substantially modify expression of genes located inthe vicinity of the target sequence.

Embodiment 3

The endonuclease according to embodiment 1 or 2, wherein said targetsequence is located at a distance of at least 100 kb from the nearestgenes.

Embodiment 4

The endonuclease according to any one of embodiments 1 to 3, whereinsaid RIS has been identified in cells from a patient treated by genetherapy by transduction of stem cells.

Embodiment 5

The endonuclease according to any one of embodiments 1 to 3, whereinsaid RIS has been identified in cells from a patient treated by genetherapy by transduction of hematopoietic stem cells.

Embodiment 6

The endonuclease according to any one of embodiments 1 to 5, whereinsaid endonuclease is a homing endonuclease.

Embodiment 7

The endonuclease according to embodiment 6, wherein said homingendonuclease is a member of the family of LAGLIDADG endonucleases.

Embodiment 8

The endonuclease according to embodiment 7, wherein said member of thefamily of LAGLIDADG endonucleases is I-CreI.

Embodiment 9

The endonuclease according to any one of embodiments 1 to 8, whereinsaid locus is selected from the SH3 locus on human chromosome 6p25.1,the SH4 locus on human chromosome 7q31.2, the SH6 locus on humanchromosome 21q21.1, the SH12 locus on human chromosome 13q34, the SH13locus on human chromosome 3p12.2, the SH19 locus on human chromosome 22,the SH20 locus on human chromosome 12q21.2, the SH21 locus on humanchromosome 3p24.1, the SH33 locus on human chromosome 6p12.2, the SH7locus on human chromosome 2p16.1 and the SH8 locus on human chromosome5.

Embodiment 10

In vitro or ex vivo use of an endonuclease as defined in any one ofembodiments 1 to 9 for inserting a transgene into the genome of a cellor a tissue.

Embodiment 11

A variant dimeric I-CreI protein comprising two monomers that comprise asequence at least 80% identical to SEQ ID NO: 1 or SEQ ID NO: 42,wherein:

-   -   i. said dimeric I-CreI protein is capable of cleaving a target        sequence located within a locus of an individual, said target        sequence being located at a distance of at most 200 kb from a        retroviral insertion site (RIS), and said RIS being neither        associated with cancer nor with abnormal cell proliferation; and    -   ii. said target sequence does not comprise a sequence of SEQ ID        NO: 4.

Embodiment 12

The dimeric I-CreI protein according to embodiment 11, wherein saiddimeric I-CreI protein is capable of cleaving a target sequence locatedwithin the SH3 locus on human chromosome 6p25.1.

Embodiment 13

The dimeric I-CreI protein according to embodiment 12, wherein saidtarget sequence comprises the sequence of SEQ ID NO: 2.

Embodiment 14

The dimeric I-CreI protein according to embodiment 12 or 13, whereinsaid protein comprises:

-   -   a) a first monomer that comprises amino acid substitutions at        positions 30, 38, 70 and 75 of SEQ ID NO: 1; and    -   b) a second monomer that comprises amino acid substitutions at        positions 44, 54, 70 and 75 of SEQ ID NO: 1.

Embodiment 15

The dimeric I-CreI protein according to embodiment 14, wherein saidpolypeptide comprises:

-   -   a) a first monomer comprising 30G 38R 70D 75N 86D mutations;    -   b) a second monomer selected from the group consisting of:        -   i. a monomer comprising 44A 54L 64A 70Q 75N 158R 162A            mutations;        -   ii. a monomer comprising 44A 54L 70Q 75Y 92R 158R 162A            mutations;        -   iii. a monomer comprising 4E 44A 54L 64A 70Q 75N 158R 162A            mutations;        -   iv. a monomer comprising 44A 54L 64A 70Q 75N 158W 162A            mutations;        -   v. a monomer comprising 44A 54L 70Q 75N mutations;        -   vi. a monomer comprising 44A 54L 57E 70Q 75N 158R 162A            mutations; and        -   vii. a monomer comprising 44V 54L 70Q 75N 77V mutations;

Embodiment 16

The dimeric I-CreI protein according to embodiment 14, wherein saidpolypeptide comprises:

-   -   a) a first monomer comprising 30G 38R 70D 75N 81T 154G        mutations;    -   b) a second monomer selected from the group consisting of:        -   i. a monomer comprising 44A 54L 70Q 75N 105A 158R 162A            mutations;        -   ii. a monomer comprising 44A 54L 64A 70Q 75N 158R 162A            mutations;        -   iii. a monomer comprising 4E 44A 54L 64A 70Q 75N 158R 162A            mutations;        -   iv. a monomer comprising 44A 54L 64A 70Q 75N 158W 162A            mutations;        -   v. a monomer comprising 44A 54L 70Q 75N mutations; and        -   vi. a monomer comprising 44V 54L 70Q 75N 77V mutations;

Embodiment 17

The dimeric I-CreI protein according to embodiment 14, wherein saidpolypeptide comprises:

-   -   a) a first monomer comprising 30G 38R 50R 70D 75N 142R        mutations;    -   b) a second monomer selected from the group consisting of:        -   i. a monomer comprising 44A 54L 70Q 75N 105A 158R 162A            mutations;        -   ii. a monomer comprising 44A 54L 64A 70Q 75N 158R 162A            mutations;        -   iii. a monomer comprising 44A 54L 70Q 75Y 92R 158R 162A            mutations;        -   iv. a monomer comprising 4E 44A 54L 64A 70Q 75N 158R 162A            mutations;        -   v. a monomer comprising 44A 54L 64A 70Q 75N 158W 162A            mutations;        -   vi. a monomer comprising 44A 54L 66C 70Q 71 R 75N 151A 158R            162A mutations;        -   vii. a monomer comprising 44A 54L 70Q 75N mutations;        -   viii. a monomer comprising 44A 54L 57E 70Q 75N 158R 162A            mutations; and        -   ix. a monomer comprising 44V 54L 70Q 75N 77V mutations;

Embodiment 18

The dimeric I-CreI protein according to embodiment 11, wherein saiddimeric I-CreI protein is capable of cleaving a target sequence locatedwithin the SH4 locus on human chromosome 7q31.2.

Embodiment 19

The dimeric I-CreI protein according to embodiment 18, wherein saidtarget sequence comprises the sequence of SEQ ID NO: 3.

Embodiment 20

The dimeric I-CreI protein according to embodiment 18 or 19, whereinsaid protein comprises:

-   -   a) a first monomer that comprises amino acid substitutions at        positions 24, 70, 75 and 77 of SEQ ID NO: 1; and    -   b) a second monomer that comprises amino acid substitutions at        positions 24, 44 and 70 of SEQ ID NO: 1.

Embodiment 21

The dimeric I-CreI protein according to embodiment 20, wherein saidpolypeptide comprises:

-   -   a) a first monomer selected from the group consisting of:        -   i. a monomer comprising 24V 44R 68Y 70S 75Y 77N mutations;        -   ii. a monomer comprising 24V 68A 70S 75N 77R mutations; and        -   iii. a monomer comprising 24V 70D 75N 77R mutations;    -   b) a second monomer selected from the group consisting of:        -   i. a monomer comprising 24V 44Y 70S mutations; and        -   ii. a monomer comprising 24V 44Y 70S 77V mutations.

Embodiment 22

The dimeric I-CreI protein according to embodiment 11, wherein saiddimeric I-CreI protein is capable of cleaving a target sequence locatedwithin the SH6 locus on human chromosome 21q21.1.

Embodiment 23

The dimeric I-CreI protein according to embodiment 22, wherein saidtarget sequence comprises the sequence of SEQ ID NO: 59.

Embodiment 24

The dimeric I-CreI protein according to embodiment 22 or 23, whereinsaid protein comprises:

-   -   a) a first monomer that comprises amino acid substitutions at        positions 44, and optionally at positions 70 and/or 75 of SEQ ID        NO: 1; and    -   b) a second monomer that comprises amino acid substitutions at        positions 28, 40, 44, 70 and 75 of SEQ ID NO: 1.

Embodiment 25

The dimeric I-CreI protein according to embodiment 24, wherein saidpolypeptide comprises:

-   -   a) a first monomer comprising 44K 68T 70G 75N mutations; and    -   b) a second monomer selected from the group consisting of:        -   i. a monomer comprising 28Q 40R 44A 70L 75N 96R 111H 144S            mutations;        -   ii. a monomer comprising 7R 28Q 40R 44A 70L 75N 85R 103T            mutations;        -   iii. a monomer comprising 28Q 40R 44A 70L 75N 103S            mutations;        -   iv. a monomer comprising 24F 27V 28Q 40R 44A 70L 75N 99R            mutations;        -   v. a monomer comprising 7R 28Q 40R 44A 70L 75N 81T            mutations;        -   vi. a monomer comprising 7R 28Q 40R 44A 70L 75N 77V            mutations;        -   vii. a monomer comprising 7R 28Q 40R 44A 70L 75N 103T 121E            132V 160R mutations;        -   viii. a monomer comprising 28Q 40R 44A 70L 75N mutations;        -   ix. a monomer comprising 7R 28Q 40R 44A 70L 75N 103T            mutations; and        -   x. a monomer comprising 28Q 34R 40R 44A 70L 75N 81V 103T            108V 160E mutations.

Embodiment 26

The dimeric I-CreI protein according to embodiment 24, wherein saidpolypeptide comprises:

-   -   a) a first monomer comprising a 44K mutation, and optionally 70S        and/or 75N mutations; and    -   b) a second monomer selected from the group consisting of:        -   i. a monomer comprising 28Q 40R 44A 70L 75N 96R 111H 144S            mutations;        -   ii. a monomer comprising 7R 28Q 40R 44A 70L 75N 85R 103T            mutations;        -   iii. a monomer comprising 28Q 40R 44A 70L 75N 103S            mutations;        -   iv. a monomer comprising 24F 27V 28Q 40R 44A 70L 75N 99R            mutations;        -   v. a monomer comprising 7R 28Q 40R 44A 70L 75N 81T            mutations;        -   vi. a monomer comprising 7R 28Q 40R 44A 70L 75N 103T 121E            132V 160R mutations;        -   vii. a monomer comprising 7R 28Q 40R 44A 70L 75N 103T            mutations; and        -   viii. a monomer comprising 28Q 34R 40R 44A 70L 75N 81V 103T            108V 160E mutations.

Embodiment 27

A fusion protein comprising the monomers of the dimeric I-CreI proteinaccording to any one of embodiments 11 to 26.

Embodiment 28

The fusion protein according to embodiment 27, wherein said monomers areconnected by a peptidic linker comprising a sequence of SEQ ID NO: 43.

Embodiment 29

The fusion protein according to embodiment 27 or 28, wherein theC-terminal monomer further comprises K7E and K96E mutations, and whereinthe N-terminal monomer further comprises E8K, E61R and G19S mutations.

Embodiment 30

The fusion protein according to any one of embodiments 27 to 29, whereinsaid fusion protein comprises a sequence selected from the groupconsisting of SEQ ID Nos. 25-40 and 76-96.

Embodiment 31

A nucleic acid encoding the endonuclease according to any one ofembodiments 1-9 or the protein according to any one of embodiments 11 to30.

Embodiment 32

An expression vector comprising the nucleic acid according to embodiment31.

Embodiment 33

The expression vector according to embodiment 32, further comprising atargeting construct comprising a transgene and two sequences homologousto the genomic sequence flanking a target sequence recognized by theendonuclease as defined in one of embodiments 1-9 or by the protein asdefined in any one of embodiments 11 to 30.

Embodiment 34

The expression vector of embodiment 33, wherein said transgene encodes atherapeutic polypeptide.

Embodiment 35

The expression vector according to any one of embodiments 32 to 34 foruse in gene therapy.

Embodiment 36

A combination of:

-   -   an expression vector according to embodiment 32; and    -   a vector comprising a targeting construct comprising a transgene        and two sequences homologous to the genomic sequence of a target        sequence a recognized by the endonuclease as defined in one of        embodiments 1-9 or by the protein as defined in any one of        embodiments 11 to 30.

Embodiment 37

A pharmaceutical composition comprising the expression vector as definedin any one of embodiments 32 to 34 or the combination as defined inembodiment 36 and a pharmaceutically active carrier.

Embodiment 38

A method of treating an individual by gene therapy comprisingadministering an effective amount of the expression vector as defined inany one of embodiments 32 to 34 or of the combination as defined inembodiment 36 to an individual in need thereof.

Embodiment 39

A method for obtaining an endonuclease suitable for inserting atransgene into the genome of an individual, comprising the step of:

-   -   a) selecting, within the genome of said individual, a retroviral        insertion site (RIS) that is neither associated with cancer nor        with abnormal cell proliferation;    -   b) defining a genomic region extending 200 kb upstream and 200        kb downstream of said RIS; and    -   c) identifying a wild-type endonuclease or constructing a        variant endonuclease capable of cleaving a target sequence        located within said genomic region.

Embodiment 40

Use of the endonuclease according to any one of embodiments 1 to 9, orof the protein according to any one of embodiments 11 to 30, or of thenucleic acid according to embodiment 31, or of the expression vectoraccording to any one of embodiments 32 to 34, or of the combinationaccording to embodiment 36, for inserting a transgene into the genome ofa cell, tissue or non-human animal, wherein said use is not therapeutic.

Embodiment 41

The use of embodiment 40, for making a non-human animal model of ahereditary disorder.

Embodiment 42

The use of embodiment 40, for producing a recombinant protein.

Embodiment 43

A non-human transgenic animal comprising a nucleic acid according toembodiment 31, or an expression vector according to any one ofembodiments 32-34, or a combination according to embodiment 36 in itsgenome.

DETAILED DESCRIPTION OF THE INVENTION

The inventors have identified “safe harbors” loci within the genomeallowing safe expression of a transgene through targeted insertionwherein (i) said loci are close to a retroviral insertion siteidentified in a cell from a patient treated by gene therapy, and (ii)said retroviral insertion are not associated with cancer or abnormalcell proliferation. As immediately apparent from the followingdescription and examples, the safe harbor loci according to theinvention may either be located within the intron of a gene, or withinan intergenic region.

In particular, the inventors have found that endonucleases could beengineered in such a way as to target said safe harbors for geneaddition.

More specifically, the inventors have engineered several I-CreImeganucleases that are capable of recognizing and cleaving targetsequences located within different safe harbors loci, for instance theSH6, the SH3 locus, the SH4 locus, the SH12 locus, the SH13 locus, theSH19, the SH20 locus, the SH21 locus, the SH33 locus, the SH7 locus, theSH8 locus, the SH18 locus, the SH31 locus, the SH38 locus, the SH39locus, the SH41 locus, the SH42 locus, the SH43 locus, the SH44 locus,the SH45 locus, the SH46 locus, the SH47 locus, the SH48 locus, the SH49locus, the SH50 locus, the SH51 locus, the SH52 locus, the SH70 locus,the SH71 locus, the SH72 locus, the SH73 locus, the SH74 locus, the SH75locus, the SH101 locus, the SH106 locus, the SH107 locus, the SH102locus, the SH105 locus, the SH103 locus, the SH104 locus, the SH113locus, the SH109 locus, the SH112 locus, the SH108 locus, the SH110locus, the SH114 locus, the SH116 locus, the SH111 locus, the SH115locus, the SH121 locus, the SH120 locus, the SH122 locus, the SH117locus, the SH118 locus, the SH119 locus, the SH123 locus, the SH126locus, the SH128 locus, the SH129 locus, the SH124 locus, the SH131locus, the SH125 locus, the SH127 locus, the SH130 locus, the SH11locus, the SH17 locus, the SH23 locus, the SH34 locus, the SH40 locus,the SH53 locus, the SH54 locus, the SH55 locus, the SH56 locus, the SH57locus, the SH58 locus, the SH59 locus, the SH60 locus, the SH61 locus,the SH62 locus, the SH65 locus, the SH67 locus, the SH68 locus and theSH69 locus that are further described herein.

It has further been shown that these meganucleases can cleave theirtarget sequences efficiently.

These meganucleases, as well as other enymes like integrases,recombinases and transposases, can therefore be used as a tool forinserting a transgene into safe harbors, thereby avoiding the appearanceof adverse events such as leukemia in the frame of gene therapy. Inaddition, these meganucleases, as well as other enymes like integrases,recombinases and transposases can be used for inserting any transgeneinto the safe harbor starting from a single targeting constructirrespective of the sequence of the transgene.

Endonucleases According to the Invention and Uses Thereof.

The invention therefore relates to:

-   -   an endonuclease capable of cleaving a target sequence for use in        inserting a transgene into the genome of an individual,        wherein (i) said genome comprises a locus comprising said target        sequence, and (ii) said target sequence is located at a distance        of at most 200 kb from a retroviral insertion site (RIS),        wherein said RIS is neither associated with cancer nor with        abnormal cell proliferation.    -   an in vitro or ex vivo use of an endonuclease capable of        cleaving a target sequence for inserting a transgene into the        genome of a cell or a tissue, (i) said genome comprises a locus        comprising said target sequence, and (ii) said target sequence        is located at a distance of at most 200 kb from a retroviral        insertion site (RIS), wherein said RIS is neither associated        with cancer nor with abnormal cell proliferation.    -   a method for inserting a transgene into the genome of an        individual comprising the steps of (i) providing an endonuclease        capable of cleaving a target sequence, wherein said genome        comprises a locus comprising said target sequence, and said        target sequence is located at a distance of at most 200 kb from        a retroviral insertion site (RIS) that is neither associated        with cancer nor with abnormal cell proliferation; (ii)        contacting an individual with a transgene and with said        endonuclease, whereby said transgene is inserted into said locus        of the genome of the individual.

As used herein, the term “endonuclease” refers to any wild-type orvariant enzyme capable of catalyzing the hydrolysis (cleavage) of bondsbetween nucleic acids within of a DNA or RNA molecule, preferably a DNAmolecule. The endonucleases according to the present invention do notcleave the DNA or RNA molecule irrespective of its sequence, butrecognize and cleave the DNA or RNA molecule at specific polynucleotidesequences, further referred to as “target sequences” or “target sites”.Target sequences recognized and cleaved by an endonuclease according tothe invention are referred to as target sequences according to theinvention.

The endonuclease according to the invention can for example be a homingendonuclease (Paques et al. Curr Gen Ther. 2007 7:49-66), a chimericZinc-Finger nuclease (ZFN) resulting from the fusion of engineeredzinc-finger domains with the catalytic domain of a restriction enzymesuch as Fokl (Porteus et al. Nat. Biotechnol. 2005 23:967-973) or achemical endonuclease (Arimondo et al. Mol Cell Biol. 2006 26:324-333;Simon et al. NAR 2008 36:3531-3538; Eisenschmidt et al. NAR 200533:7039-7047; Cannata et al. PNAS 2008 105:9576-9581). In chemicalendonucleases, a chemical or peptidic cleaver is conjugated either to apolymer of nucleic acids or to another DNA recognizing a specific targetsequence, thereby targeting the cleavage activity to a specificsequence.

The endonuclease according to the invention is preferably a homingendonuclease, also known under the name of meganuclease. Such homingendonucleases are well-known to the art (see e.g. Stoddard, QuarterlyReviews of Biophysics, 2006, 38:49-95). Homing endonucleases recognize aDNA target sequence and generate a single- or double-strand break.Homing endonucleases are highly specific, recognizing DNA target sitesranging from 12 to 45 base pairs (bp) in length, usually ranging from 14to 40 bp in length. The homing endonuclease according to the inventionmay for example correspond to a LAGLIDADG endonuclease, to a HNHendonuclease, or to a GIY-YIG endonuclease. Examples of suchendonuclease include I-Sce I, I-Chu I, I-Cre I, I-Csm I, PI-Sce I,PI-Tli I, PI-Mtu I, I-Ceu I, I-Sce II, I-Sce III, HO, PI-Civ I, PI-CtrI, PI-Aae I, PI-Bsu I, PI-Dha I, PI-Dra I, PI-Mav I, PI-Mch I, PI-Mfu I,PI-Mfl I, PI-Mga I, PI-Mgo I, PI-Min I, PI-Mka I, PI-Mle I, PI-Mma I,PI-Msh I, PI-Msm I, PI-Mth I, PI-Mtu I, PI-Mxe I, PI-Npu I, PI-Pfu I,PI-Rma I, PI-Spb I, PI-Ssp I, PI-Fac I, PI-Mja I, PI-Pho I, PI-Tag I,PI-Thy I, PI-Tko I, PI-Tsp I, I-MsoI.

In a preferred embodiment, the homing endonuclease according to theinvention is a LAGLIDADG endonuclease such as I-SceI, I-CreI, I-CeuI,I-MsoI, and 1-DmoI.

In a most preferred embodiment, said LAGLIDADG endonuclease is I-CreI.Wild-type I-CreI is a homodimeric homing endonuclease that is capable ofcleaving a 22 to 24 bp double-stranded target sequence. The sequence ofa wild-type monomer of I-CreI includes the sequence shown as SEQ ID NO:1 (which corresponds to the I-CreI sequence of pdb accession number1g9y) and the sequence shown in SwissProt Accession n° P05725 (inparticular the sequence shown in version 73, last modified Nov. 3,2009).

In the present patent application, the I-CreI variants may comprise anadditional alanine after the first methionine of the wild type I-CreIsequence, and three additional amino acid residues at the C-terminalextremity (see sequence of SEQ ID NO: 42 and FIG. 11). These threeadditional amino acid residues consist of two additional alanineresidues and one aspartic acid residue after the final proline of thewild type I-CreI sequence. These additional residues do not affect theproperties of the enzyme. For the sake of clarity, these additionalresidues do not affect the numbering of the residues in I-CreI orvariants thereof. More specifically, the numbering used hereinexclusively refers to the position of residues in the wild type I-CreIenzyme of SEQ ID NO: 1. For instance, the second residue of wild-typeI-CreI is in fact the third residue of a variant of SEQ ID NO: 42 sincethis variant comprises an additional alanine after the first methionine.

In the present application, I-CreI variants may be homodimers(meganuclease comprising two identical monomers), heterodimers(meganuclease comprising two non-identical monomers) and single-chains.

The invention encompasses both wild-type (naturally-occurring) andvariant endonucleases. In a preferred embodiment, the endonucleaseaccording to the invention is a “variant” endonuclease, i.e. anendonuclease that does not naturally exist in nature and that isobtained by genetic engineering or by random mutagenesis. The variantendonuclease according to the invention can for example be obtained bysubstitution of at least one residue in the amino acid sequence of awild-type, naturally-occurring, endonuclease with a different aminoacid. Said substitution(s) can for example be introduced bysite-directed mutagenesis and/or by random mutagenesis. In the frame ofthe present invention, such variant endonucleases remain functional,i.e. they retain the capacity of recognizing and specifically cleaving atarget sequence.

The variant endonuclease according to the invention cleaves a targetsequence that is different from the target sequence of the correspondingwild-type endonuclease. For example, the target sequence of a variantI-CreI endonuclease is different from the sequence of SEQ ID NO: 4.Methods for obtaining such variant endonucleases with novelspecificities are well-known in the art.

The present invention is based on the finding that such variantendonucleases with novel specificities can be used for inserting a geneinto a “safe harbor” locus of the genome of a cell, tissue orindividual.

As used herein, the term “locus” is the specific physical location of aDNA sequence (e.g. of a gene) on a chromosome. As used in thisspecification, the term “locus” usually refers to the specific physicallocation of an endonuclease's target sequence on a chromosome. Such alocus, which comprises a target sequence that is recognized and cleavedby an endonuclease according to the invention, is referred to as “locusaccording to the invention”.

Ideally, insertion into a safe harbor locus should have no impact on theexpression of other genes. Testing these properties is a multi-stepprocess, and a first pre-screening of candidate safe harbor loci bybioinformatic means is desirable. One can thus first identify loci inwhich targeted insertion is unlikely to result in insertionalmutagenesis.

One of the major features of a locus according to the invention is that(i) it is located in a region wherein retroviral insertion was observedin a cell from a patient, in a gene therapy clinical trial, and (ii)said retroviral insertion has not been associated with a cancer or anabnormal cell proliferation.

Indeed, one way to identify safe habor loci according to the inventionis to use the data generated by former gene therapy trials. In theX-SCID trial, insertions of retroviral vector-borne transgenes next tothe LMO2 and CCND2 genes have been shown to be associated with leukemia.The follow up of vector insertions in patients have clearly demonstratedthat cells carrying this insertion had outnumbered the other modifiedcells after a several years process (Hacein-Bey-Abina et al. Science2003 302:415-9; Deichmann et al. J. of Clin. Invest. 2007 117:2225-32,Cavazzana-Calvo et al. Blood 2007 109:4575-4581). In another clinicaltrial, insertion in several loci were found to trigger a highproliferation rate in two patients (Ott et al. Nat Med 2006 12:401-9).In these cases, proliferation seemed to be a consequence of theinsertional activation of the MDS1-EVI1, PRDM16, or SETBP1 genes.Although malignancy was not observed initially, EVII activationeventually resulted in myelodysplasia in both patients (Stein et al.,Nat. Med. 2010 16: 198-205). More generally, even if non oncogenic, cellproliferation resulting from activation of a gene close to the insertcould represent a first step towards malignancy, and therefore lead topotential problems in terms of safety. In order to better understand thepattern of viral vector integration, and its potential consequences onthe fate of transformed cells, several large scale studies of RetroviralInsertion Sites (RIS) have been conducted in patients from gene therapytrials (Mavilio et al., Nat Med 2006:1397-1402; Recchia et al. PNAS2006:1457-62; Aiuti, et al. J Clin Invest 2007:2233-40; Schwarzwaelderet al. J Clin Invest 2007:2241-9; Deichmann et al. J Clin Invest2007:2225-32). RIS which are not associated with leukemia or withabnormal cell proliferation can be considered as safe harbors.Therefore, the locus according to the invention preferably overlaps oris close to a RIS identified in a clinical trial, and yet not associatedwith cancer or abnormal cell proliferation.

More specifically, the locus according to the invention is defined as alocus comprising a target sequence that is located at a distance of atmost 200, 180, 150, 100 or 50 kb from a retroviral insertion site (RIS),said RIS being neither associated with cancer nor with abnormal cellproliferation. Such loci are referred to as “safe harbor” loci accordingto the invention (or loci according to the invention), i.e. loci thatare safe for insertion of transgenes.

By “Retroviral insertion sites” (RIS) is meant a genomic site which wasidentified as an insertion site for a retroviral vector in a cell from apatient treated by gene therapy with said retroviral vector. Such RISare well-known to the art. They include but are not limited to thosedescribed in Schwarzwaelder et al. (J. Clin. Invest. 2007 117:2241),Deichmann et al. (J. of Clin. Invest. 2007 117:2225), Aiuti et al. (J.Clin. Invest. 2007 117:2233), Recchia et al. (PNAS 2006 103:1457) andMavilio et al. (Nature Medicine 12:1397, 2006).

By “retroviral vector” is meant any vector derived from a virus from theretroviridae family.

The RIS according to the invention is neither associated with cancer norwith abnormal cell proliferation. RIS known to be associated withleukemia or with abnormal cell proliferation are well known in the artand can easily be excluded by the skilled in the art. Such RIS known tobe associated with leukemia or with abnormal cell proliferation include,e.g., insertion sites next to the LMO2, CCND2, MDS1-EVI1, PRDM16, andSETBP1 genes.

In a more preferred embodiment according to the invention, the RIS usedto define safe harbor loci have been identified in a clinical trial,with the transduced cells being stem cells. The RIS can thus have beenidentified in cells from a patient treated by gene therapy bytransduction of stem cells.

In another most preferred embodiment according to the invention, the RISused to define safe harbor loci have been identified in a clinical trialfor SCID patients, with the transduced cells being hematopoietic stemcells (HSCs). The RIS can thus have been identified in cells from apatient treated by gene therapy by transduction of hematopoietic stemcells.

Furthermore, more stringent criteria for definition of a RIS accordingto the invention can be used.

Among RIS, Common Integration sites (CIS) are loci in which thestatistical over representation of RIS could be interpreted as theconsequence of cell high proliferation rate upon insertion. (Mikkers etal., 2003, Nat. Genet. 32:153; Lund et al., 2002, Nat. Genet. 32:160;Hemati et al. 2004, PLOS Biol. 2:e423; Suzuki et al., 2002, Nat. Genet.32:166-174; Deichman et al. J. of Clin. Invest. 2007 117:2225-32). Forexample, Deichman et al. (J. of Clin. Invest. 2007 117:2225-32) made asurvey of RIS from 9X-SCID patients treated by gene therapy, and found572 unique RIS that could be mapped unequivocally to the human genome.Among them, they defined CIS of second, third, fourth, fifth, and higherorder. CIS of second orders were defined by the occurrence of tworetroviral insertions within a 30 kb distance, CIS of third, fourth andfifth order by the occurrence of 3, 4 or 5 insertions within 50, 100 or200 kb, respectively. 122 RIS were found in 47 different CIS loci,33-fold the value expected under random distribution of the RIS. ElevenCIS were found to localize next to proto-oncogenes, including ZNF217,VAV-3, CCND2, LMO2, MDS1, BCL2L1, NOTCH2, SOCS2, RUNX1, RUNX3, andSEPT6.

To ensure maximal safety, it could be preferred to avoid RIS locatedwithin CIS. Therefore, in a preferred embodiment according to theinvention, the target sequence according to the invention is not locatedin a CIS, In addition, said target sequence or locus is preferablylocated at a distance of at least 50, 100 or 200 kb from a RIS beingpart of a common integration site (CIS).

By “Common Integration site” (CIS) is meant a genomic region of 30 kb,50 kb, 100 kb or 200 kb wherein RIS identified in clinical trials areoverrepresented (assuming a random distribution of insertions). Such CISare well known in the art and are described in Schwarzwaelder et al. (J.Clin. Invest. 2007 117:2241), Deichmann et al. (J. of Clin. Invest. 2007117:2225), Aiuti et al. (J. Clin. Invest. 2007 117:2233), Recchia et al.(PNAS 2006 103:1457), Mavilio et al. (Nature Medicine 12:1397, 2006) andGabriel et al. (Nat. Med. 2009 15(12):143.

In addition to be close to a RIS, targeted integration into the locusaccording to the invention should not result in the disruption ofessential functions in the targeted cell.

Therefore, in a specific embodiment according to the invention,insertion into the locus according to the invention does preferably notsubstantially modify expression of genes located in the vicinity of thetarget sequence, for example of the nearest genes.

In addition, in another specific embodiment, insertion of a geneticelement into said locus does preferably not substantially modify thephenotype of said cell, tissue or individual (except for the phenotypedue to expression of the genetic element). By “phenotype” is meant acell's, a tissue's or an individual's observable traits. The phenotypeincludes e.g. the viability, the cellular proliferation and/or thegrowth rate. The skilled in the art can easily verify that a locus is asafe harbor locus according to the invention e.g. by analyzing theexpression pattern of adjacent genes, by carrying out micro-arraystudies of transcriptome and/or by characterizing proliferation and/ordifferentiation abnormalities (if any).

In still another specific embodiment, the locus according to theinvention does not comprise any gene. A locus that does not comprise anygene refers to a locus that does not comprise any referenced or knowngene. In other terms, such a locus does not comprise any known geneaccording to sequence databases such as those available on the NationalCenter for Biotechnology Information (NCBI) website. Therefore, thetarget sequence according to the invention and/or the locus according tothe invention can advantageously be located at a distance of at least 1,5, 10, 25, 50, 100, 180, 200, 250, 300, 400 or 500 kb from the nearestgenes.

By “gene” is meant the basic unit of heredity, consisting of a segmentof DNA arranged in a linear manner along a chromosome, which codes for aspecific protein or segment of protein. A gene typically includes apromoter, a 5′ untranslated region, one or more coding sequences(exons), optionally introns, a 3′ untranslated region. The gene mayfurther comprise a terminator, enhancers and/or silencers.

By “nearest genes” is meant the one, two or three genes that are locatedthe closest to the target sequence, centromeric and telomeric to thetarget sequence respectively.

In a preferred embodiment, the locus according to the invention furtherallows stable expression of the transgene.

In another preferred embodiment, the target sequence according to theinvention is only present once within the genome of said cell, tissue orindividual.

Once such a safe harbor locus according to the invention has beenselected, one can then (i) either construct a variant endonucleasespecifically recognizing and cleaving a target sequence located withinsaid locus, e.g. as described in Examples 1, 2 and 5, or (ii) determinewhether a known wild-type endonuclease is capable of cleaving a targetsequence located within said locus. Alternatively, once a safe harborlocus according to the invention has been selected, the skilled in theart can insert therein a target sequence that is recognized and cleavedby a known wild-type or variant endonuclease.

Therefore, the invention is drawn to a method for obtaining anendonuclease suitable for inserting a transgene into the genome of anindividual, comprising the step of:

-   -   a) selecting and/or identifying, within the genome of said        individual, a retroviral insertion site (RIS) that is neither        associated with cancer nor with abnormal cell proliferation;    -   b) defining a genomic region extending 200 kb upstream and 200        kb downstream of said RIS; and    -   c) identifying a wild-type endonuclease or constructing a        variant endonuclease capable of cleaving a target sequence        located within said genomic region.        Such an endonuclease allows safely inserting a transgene into        the genome of the cell, tissue or individual, for example        without substantially modifying (i) expression of the nearest        genes, and/or (ii) the cellular proliferation and/or the growth        rate of the cell, tissue or individual.

All criteria presented hereabove in connection with the locus accordingto the invention can of course be applied when carried out the abovemethod. For example, RIS being part of a CIS may be excluded, and/or thegenomic region defined at step (b) may only extend 50 kb upstream and 50kb downstream of said RIS, and/or the locus comprising the targetsequence may not comprise any gene.

The locus according to the invention may for example correspond to anyone of the SH3, SH4, SH6, SH12, SH13, SH19, SH20, SH21, SH33, SH7 or SH8loci which are described in Tables A to C below.

Table A provides the location of the locus within the human genome, atarget sequence comprised within the locus, the location of the closestRIS as well as the reference to a publication describing the RIS, andexamples of endonucleases according to the invention that cleave thelocus.

Table B provides information about the nearest genes that are locatedimmediately upstream (at 5′) and downstream (at 3′) of the locusaccording to the invention. The distance indicates the distance betweenthe target sequence and the nearest coding sequence of the gene.

Table C and D provide similar information as Table B, but for the secondnearest genes and for the third nearest genes, respectively.

Tables A′, B′, C′ and D′ provide updated information similar to that inTables A, B, C and D, respectively, for some loci and associatedexamples of target sequences within these loci, namely SH3, SH4, SH6,SH8 and SH19. Updated localization information is given by reference toGRCh37/hg19 version of the human genome assembly.

The locus according to the invention may also correspond to any one ofthe SH18, SH31, SH38, SH39, SH41, SH42, SH43, SH44, SH45, SH46, SH47,SH48, SH49, SH50, SH51, SH52, SH70, SH71, SH72, SH73, SH74 and SH75which are described in Tables A″ to D″ below.

Table A″ provides the location of the locus within the human genome, atarget sequence comprised within the locus, the location of the closestRIS as well as the reference to a publication describing the RIS, thedistance between said target and the closest RIS and examples ofendonucleases according to the invention that cleave the locus.

Table B″ provides information about the nearest genes that are locatedimmediately upstream (at 5′) and downstream (at 3′) of the locusaccording to the invention. The distance indicates the distance betweenthe target sequence and the nearest coding sequence of the gene.

Table C″ and D″ provide similar information as Table B″, but for thesecond nearest genes and for the third nearest genes, respectively.

Locations of loci, targets in this loci and genes are given according toGRCh37/hg19 version of the human genome assembly.

TABLE A Example of Target Cleaved by Sequence SEQ Close to  RISmeganucleases Human Comprised within  ID a RIS at described(examples) of Name chromosome locus the locus: NO: position: in:SEQ ID NO: SH3  6 6p25.1 CCAATACAAGGTACAAAG 54 6837845 Deichmann, 200725-32 TCCTGA SH4  7 7q31.2 TTAAAACACTGTACACCA 55 114606124Schwarzwaelder, 2007 33-40 TTTTGA SH6 21 21q21.1 TTAATACCCCGTACCTAA 5917265069 Schwarzwaelder, 2007 76-85 TATTGC SH12 13 13q34ATAAAACAAGTCACGTTA 97 109463429 Mavilio, 2006 89 TTTTGG SH13  3 3p12.2ATTACACTCTTTAAGTGA 98 80607284 Recchia, 2006 90 TTTTAA SH19 22 chr22GCAAAACATTGTAAGACA 99 46815611 Aiuti, 2007 91 TGCCAA SH20 12 12q21.2GCTGGCTGCTTCACATTG 100 74339720 Mavilio, 2006 92 GAGAGA SH21  3 3p24.1TAGAAATCTGTTAAAAGA 101 31235316 Deichmann, 2007 93-95 GATGAT SH33  66p12.2 TTTTCATCACTTAAAGTG 102 50055278 Recchia, 2006 96 TTTTAA SH7  22p16.1 ACAACACTTTGTGAGACG 103 58962165 Deichmann, 2007 86-87 TCTAAG SH8 5 chr5 ACAATCTGAGGTAAGTAA 104 20572231 Aiuti, 2007 88 TACTGA

TABLE A′ Example of RIS Cleaved  Target position by mega- Human SequenceClose  according  nucleases chro- Comprised SEQ to a RIS  to RIS RIS(examples) mo-  within  ID at GRCh37/ Distance described of SEQ ID Namesome locus the locus: NO:  position: hg19 (bases) in: NO: SH3  6 6p25.1CCAATACAAGGTAC  54 6837845 6892846 40782 Deichmann, 25-32 AAAGTCCTGA2007 SH4  7 7q31.2 TTAAAACACTGTAC  55 114606124 115051621 77337Schwarzwaelder, 33-40 ACCATTTTGA 2007 SH6 21 21q21.1 TTAATACCCCGTAC  5917265069 18343198 96099 Schwarzwaelder, 76-85 CTAATATTGC 2007 SH8  5chr5 ACAATCTGAGGTAA 104 20572231 20536474 50714 Aiuti, 2007 88GTAATACTGA SH19 22 chr22 GCAAAACATTGTAA  99 46815611 20536474 97664Aiuti, 2007 91 GACATGCCAA

TABLE A″ Target Example of  position Target Sequence Human onComprised within  Name chromosome chromosome the locus: SEQ ID NO: SH18 5 20634138 CTTACCCCACGTACCACAGACTGT 105 SH31 14 65874037TTGTAATGTCTTACAAGGTTTTAA 106 SH38 10 3983262 CTGGGATGTCTCACGACAGCATGG107 SH39 11 104531937 TCCTTCTGTCTTAAGAGATTTATC 108 SH41  5 18182572CCTCTCTTAGGTGAGACGGTACAT 109 SH42  5 20466837 TATATCCCATGTGAGACATGCAGT110 SH43 18 37446750 TAAATACGTCTTACATTATTTTGC 111 SH44  6 147302518AAGAAATGTCTCACAGAATTTTAC 112 SH45  8 24854461 CAGATATGTCTTAAAATGTCACTG113 SH46 19 12036102 ACCAGATGTCGTGAGACGGGGGAG 114 SH47  8 25002335GCAGGCTTATTCACCAGGGTTTAC 115 SH48 10 101896036 TTGAAATTAGTTACAGGAGGTTAT116 SH49 13 68191409 ATAATACAATTTACCTAATCCTAT 117 SH50  1 47411545CCCGGCCCCTTTAATCCATCTTAA 118 SH51 21 30011146 TTGAGCTCACTCACATGGTCTCAG119 SH52 12 76131166 CTCCACTGTCTTACCTAATCCAGC 120 SH70 12 796917CATGTATGATTTACATCGGTTTGA 121 SH71  2 231579954 GTTGTATTATTTACCTCAGATGAA122 SH72  6 25192217 TTTGGATGCTGTAAAGAATTTCCT 123 SH73  8 78807830ATAAAACGACTTACAAGGTCTGAA 124 SH74 19 29033855 TTCAGATCTCGTACAGGGGATGAC125 SH75  8 114771707 CTGCCATAGGGTAACTGAGTCAAT 126 RIS Cleaved positionby mega- according nucleases Close to  to RIS (example) a RIS at GRCh37/RIS described of SEQ Name position: hg19 distance in: ID NO: PlasmidsSH18 20536474 20536474 97664 Aiuti, 2007 127 pCLS5518 128 pCLS5519 129pCLS5520 130 pCLS5521 SH31 64841555 65771802 102235 Recchia, 2006 131pCLS3904 132 pCLS4076 SH38 3929865 3939865 43397 Mavilio F, 2006 SH39104003035 104465318 66619 Schwartzwaelder,  133 pCLS6038 2007 134pCLS6039 SH41 18180277 18134776 47796 Schwartzwaelder,  135 pCLS51872007 136 pCLS5188 SH42 20581361 20535860 69023 Schwartzwaelder,  137pCLS5549 2007 138 pCLS5550 SH43 35630950 37378963 67787Schwartzwaelder,  139 pCLS5594 2007 140 pCLS5595 SH44 147201063147220493 82025 Schwartzwaelder,  141 pCLS5868 2007 142 pCLS5869 SH4524923302 24867385 12924 Mavilio F, 2006 SH46 11713157 11852157 183945Mavilio F, 2006 SH47 24923302 24867385 134950 Mavilio F, 2006 SH48101755754 101765764 130272 Mavilio F, 2006 SH49 65947183 68149182 42227Schwartzwaelder,  2007 SH50 46928138 47216118 195427 Mavilio F, 2006SH51 28929744 30007873 3273 Mavilio F, 2006 SH52 74339720 76053453 77713Mavilio F, 2006 143 pCLS5870 144 pCLS5871 SH70 708202 837941 41024Recchia, 2006 145 pCLS5957 SH71 231351771 231526266 53688 Recchia, 2006146 pCLS5958 SH72 25101289 24993310 198907 Recchia, 2006 147 pCLS5959SH73 78989339 78939377 131547 Deichmann, 2007 148 pCLS5960 SH74  33661180 28969340 64515 Deichmann, 2007 149 pCLS5961 SH75  114711413114754830 16877 Deichmann, 2007 150 pCLS5962

TABLE B Dist Dist Left Gene Left Right Gene Right Name Left Gene1Description1 Kb1 Right Gene1 Description1 Kb1 SH3 LY86 MD-1, RP105- 197RREB1 ras responsive 330 associated element binding protein 1 isoform 1SH4 MDFIC MyoD family 318 TFEC transcription 606 inhibitor domain factorcontaining EC isoform b protein isoform p40 SH6 C21orf34 hypothetical675 CXADR coxsackie virus 446 protein and adenovirus LOC388815 receptorisoform b precursor SH12 LOC728767 hypothetical 41 COL4A1 alpha 1 typeIV 302 protein collagen preproprotein preproprotein SH13 ROBO1roundabout 1 919 LOC728290 hypothetical 484 isoform a protein SH19LOC100289420 hypothetical 1106 FAM19A5 family with 208 protein sequenceXP_002343824 similarity 19 (chemokine (C-C motif)- like), member A5isoform 1 SH20 KRR1 HIV-1 rev 120 LOC100289143 hypothetical 307 bindingprotein 2 protein XP_002343241 SH21 GADL1 glutamate 236 STT3B source of402 decarboxylase- immunodominant like 1 MHC-associated peptides SH33DEFB133 beta-defensin 7 DEFB114 beta-defensin 114 4 133 SH7 FANCLFanconi anemia, 685 LOC730134 similar to 312 complementation hCG1815165group L isoform 2 SH8 CDH18 cadherin 18, 647 LOC100288118 hypothetical988 type 2 protein preproprotein XP_002342537 preproprotein

TABLE B′ Dist Dist Left Gene Left Right Gene Right Name Left Gene1Description 1 Kb1 Right Gene1 Description1 Kb1 SH3 LOC652960 na 56 RREB1ras responsive 256 element binding protein 1 isoform 2 SH4 MDFIC MyoDfamily 315 LOC100287693 na 162 inhibitor domain containing proteinisoform p40 SH6 RPS26P5 na 945 RPL39P40 na 433 SH8 NUP50P3 na 179LOC728411 na 973 SH19 LOC100289420 hypothetical 1105 FAM19A5 family with208 protein sequence XP_002343824 similarity 19 (chemokine (C-C motif)-like), member A5 isoform 2

TABLE B″ Dist Dist Left Gene Left Right Gene Right Name Left Gene1Description1 Kb1 Right Gene1 Description1 Kb1 SH18 NUP50P3 na 328LOC728411 na 825 SH31 PTBP1P na 127 LOC645431 na 3 SH38 LOC727894hypothetical 5 LOC100128356 na 498 protein SH39 DDI1 DDI1, DNA-damage622 CASP12 na 225 inducible 1, homolog 1 SH41 RPL36AP21 na 132 RPL32P14na 858 SH42 NUP50P3 na 160 LOC728411 na 992 SH43 RPL7AP66 na 531RPL17P45 na 277 SH44 LOC729176 na 177 STXBP5 syntaxin binding 222protein 5 (tomosyn) isoform a SH45 NEFL neurofilament, 40 DOCK5dedicator of 187 light cytokinesis 5 polypeptide 68 kDa SH46 VN2R15P na9 VN2R21P na 27 SH47 NEFL neurofilament, 188 DOCK5 dedicator of 39 lightcytokinesis 5 polypeptide 68 kDa SH48 LOC644566 na 18 LOC644573 na 6SH49 RPSAP53 na 349 LOC390411 na 214 SH50 CYP4A11 cytochrome P450, 4CYP4X1 cytochrome 77 family 4, P450, family 4, subfamily A, subfamily X,polypeptide 11 polypeptide 1 SH51 NCRNA00161 na 98 N6AMT1 N-6 adenine-233 specific DNA methyltransferase 1 isoform 1 SH52 RPL10P13 na 48LOC100289143 hypothetical 201 protein XP_002343241 SH70 LOC100049716 na41 LOC100132369 hypothetical 64 protein SH71 LOC646839 na 141 ITM2Cintegral 149 membrane protein 2C isoform 3 SH72 LOC100132239 na 38LOC100129757 na 26 SH73 LOC100289199 na 878 PKIA cAMP-dependent 620protein kinase inhibitor alpha isoform 7 SH74 LOC100131694 na 558LOC100129507 na 184 SH75 RPL18P7 na 382 TRPS1 zinc finger 1648transcription factor TRPS1

TABLE C Dist Dist Left Gene Left Right Gene Right Name Left Gene2Description2 Kb2 Right Gene2 Description2 Kb2 SH3 F13A1 coagulation 533LOC100288758 hypothetical 378 factor XIII A1 protein subunitXP_002342653 precursor SH4 FOXP2 forkhead box P2 644 TES testin isoform1 876 isoform III SH6 C21orf34 hypothetical 996 BTG3 B-cell 527 proteintranslocation LOC388815 gene 3 isoform b isoform a SH12 IRS2 insulinreceptor 63 COL4A2 alpha 2 type IV 459 substrate 2 collagenpreproprotein preproprotein SH13 ROBO2 roundabout, 2863 GBE1 glucan 982axon guidance (1,4-alpha-), receptor, branching homolog 2 enzyme 1isoform ROBO2a SH19 TBC1D22A TBC1 domain 1108 FAM19A5 family with 295family, member sequence 22A similarity 19 (chemokine (C-C motif)- like),member A5 isoform 2 SH20 GLIPR1 GLI 133 LOC100131830 hypothetical 382pathogenesis- protein related 1 precursor SH21 TGFBR2 transforming 439OSBPL10 oxysterol- 532 growth factor, binding beta receptor IIprotein-like isoform A protein 10 precursor SH33 CRISP1 acidicepididymal 99 DEFB113 beta-defensin 113 13 glycoprotein-like 1 isoform 2precursor SH7 VRK2 vaccinia related 767 BCL11A B-cell CLL/ 1526 kinase 2isoform 6 lymphoma 11A isoform 3 SH8 LOC391769 similar to 2830 CDH12cadherin 12, 1266 HIStone family type 2 member (his-72) preproproteinpreproprotein

TABLE C′ Dist Dist Left Gene Left Right Gene Right Name Left Gene2Description2 Kb2 Right Gene2 Description2 Kb2 SH3 LY86 MD-1, RP105- 196LOC100288758 hypothetical 376 associated protein XP_002342653 SH4 FOXP2forkhead box P2 643 TFEC transcription 600 isoform III factor EC isoforma SH6 VDAC2P na 971 CXADR coxsackie virus 446 and adenovirus receptorprecursor SH8 CDH18 cadherin 18, 646 LOC100288118 hypothetical 987 type2 protein preproprotein XP_002342537 preproprotein SH19 TBC1D22A TBC1domain 1107 LOC100128946 hypothetical 614 family, member protein 22A

TABLE C″ Dist Dist Left Gene Left Right Gene Right Name Left Gene2Description2 Kb2 Right Gene2 Description2 Kb2 SH18 CDH18 cadherin 18,794 LOC100288118 hypothetical 839 type 2 protein preproproteinXP_002342537 preproprotein SH31 RPL36AP2 na 137 FUT8 fucosyltransferase3 8 isoform c SH38 LOC100130652 hypothetical 112 LOC100216001 na 709protein SH39 PDGFD platelet derived 496 LOC643733 na 242 growth factor Disoform 1 precursor SH41 LOC100133112 na 488 LOC646273 na 1050 SH42CDH18 cadherin 18, 627 LOC100288118 hypothetical 1006 type 2 proteinpreproprotein XP_002342537 preproprotein SH43 LOC647946 na 114 KC6 na1613 SH44 C6orf103 hypothetical 165 LOC442266 na 425 protein LOC79747SH45 LOC100129717 na 40 GNRH1 gonadotropin- 422 releasing hormone 1precursor SH46 ZNF69 zinc finger 10 ZNF763 zinc finger 39 protein 69protein 440 like SH47 LOC100129717 na 188 GNRH1 gonadotropin- 274releasing hormone 1 precursor SH48 CPN1 carboxypeptidase 54 ERLIN1 ERlipid raft 13 N, polypeptide 1 associated 1 precursor SH49 LOC730236hypothetical 385 OR7E111P na 284 protein SH50 CYP4Z2P na 45 CYP4Z1cytochrome P450 121 4Z1 SH51 C21orf94 na 615 HSPD1P7 na 248 SH52LOC100129649 na 135 LOC100131830 hypothetical 276 protein SH70 NINJ2ninjurin 2 24 WNK1 WNK lysine 65 deficient protein kinase 1 SH71 HMGB1L3na 199 GPR55 G protein-coupled 192 receptor 55 SH72 NUP50P2 na 50RPL21P68 na 69 SH73 PXMP3 peroxin 2 895 FAM164A hypothetical 770 proteinLOC51101 SH74 LOC100132081 na 640 LOC148145 na 422 SH75 LOC100289099 na1220 EIF3H eukaryotic 2885 translation initiation factors, subunitsgamma, 40 kDa

TABLE D Dist Dist Left Gene Left Right Gene Right Name Left Gene3Description3 Kb3 Right Gene3 Description3 Kb3 SH3 NRN1 neuritin 845LOC100288790 hypothetical 417 precursor protein XP_002342654 SH4 FOXP2forkhead box P2 644 TES testin isoform 2 900 isoform II SH6 USP25ubiquitin 1189 C21orf91 early 726 specific undifferentiated peptidase 25retina and lens isoform 2 SH12 MYO16 myosin heavy 642 RAB20 RAB20,member 675 chain Myr 8 RAS oncogene family SH13 ROBO2 roundabout, 2863LOC100289598 hypothetical 4448 axon guidance protein receptor,XP_002342405 homolog 2 isoform ROBO2b SH19 CERK ceramide kinase 1543LOC100128946 hypothetical 616 protein SH20 GLIPR1L2 GLI 209 PHLDA1pleckstrin 398 pathogenesis- homology-like related 1 like 2 domain,family A, member 1 SH21 RBMS3 RNA binding 1127 ZNF860 zinc finger 859motif, single protein 860 stranded interacting protein 3 isoform 1 SH33CRISP1 acidic epididymal 99 DEFB110 beta-defensin 110 53glycoprotein-like 1 isoform 1 precursor SH7 VRK2 vaccinia related 767BCL11A B-cell 1526 kinase 2 isoform CLL/lymphoma 2 11A isoform 2 SH8LOC391767 similar to TBP- 2851 PRDM9 PR domain 3023 associated factorcontaining 9 11

TABLE D′ Dist Dist Left Gene Left Right Gene Right Name Left Gene3Description3 Kb3 Right Gene3 Description3 Kb3 SH3 LOC643875 na 316LOC100288790 hypothetical 416 protein XP_002342654 SH4 RPL36P13 na 1036TES testin isoform 2 876 SH6 C21orf34 hypothetical 459 BTG3 B-cell 526protein translocation LOC388815 gene 3 isoform a isoform b SH8 LOC646273na 1251 GUSBP1 na 1005 SH19 CERK ceramide kinase 1542 LOC100287247hypothetical 768 protein XP_002343807

TABLE D″ Dist Dist Left Gene Left Right Gene Right Name Left Gene3Description3 Kb3 Right Gene3 Description3 Kb3 SH18 LOC646273 na 1399GUSBP1 na 857 SH31 RPL21P7 na 139 RPL21P8 na 60 SH38 KLF6 Kruppel-like155 LOC338588 na 715 factor 6 SH39 LOC100190922 na 1031 CASP4 caspase 4281 isoform gamma precursor SH41 LOC391769 similar to 526 CDH18 cadherin18, 1290 HIStone family type 2 member (his-72) preproproteinpreproprotein SH42 LOC646273 na 1232 GUSBP1 na 1024 SH43 RPL12P40 na2193 NPM1P1 na 1922 SH44 RAB32 RAB32, 426 SAMD5 sterile alpha 527 memberRAS motif domain oncogene family containing 5 SH45 LOC100289018hypothetical 81 KCTD9 potassium 430 protein channel XP_002342868tetramerisation domain containing 9 SH46 VN2R14P na 53 ZNF433 zincfinger 89 protein 433 SH47 LOC100289018 hypothetical 229 KCTD9 potassium283 protein channel XP_002342868 tetramerisation domain containing 9SH48 NCRNA00093 na 177 CHUK conserved helix- 52 loop-helix ubiquitouskinase SH49 PCDH9 protocadherin 9 386 OR7E33P na 293 isoform 1 precursorSH50 LOC100132680 na 45 LOC100132432 na 123 SH51 NCRNA00113 na 887LOC391276 na 262 SH52 KRR1 HIV-1 rev binding 225 PHLDA1 pleckstrin 288protein 2 homology-like domain, family A, member 1 SH70 B4GALNT3 beta1,4-N-acetyl- 125 HSN2 hereditary 179 galactosaminyl- sensorytransferase- neuropathy, type transferase-III II SH71 SP100 nuclearantigen 169 LOC100289170 na 232 Sp100 isoform 2 SH72 CMAH na 54LOC100128495 na 80 SH73 ZFHX4 zinc finger 1028 IL7 interleukin 7 837homeodomain 4 precursor SH74 LOC642290 na 715 UQCRFS1 ubiquinol- 664cytochrome c reductase, Rieske iron-sulfur polypeptide 1 SH75 CSMD3 CUBand Sushi 322 UTP23 UTP23, small 3007 multiple domains 3 subunit (SSU)isoform 2 processome component, homolog

The locus according to the invention may also correspond to any one ofthe SH101, SH106, SH107, SH102, SH105, SH103, SH104, SH113, SH109,SH112, SH108, SH110, SH114, SH116, SH111, SH115, SH121, SH120, SH122,SH117, SH118, SH119, SH123, SH126, SH128, SH129, SH124, SH131, SH125,SH127 and SH130 which are described in Tables E and F below.

Table E provides the location of the locus within the human genome, atarget sequence comprised within the locus, the location of the closestRIS as well as the reference to a publication describing the RIS, thedistance between said target and the closest RIS and examples ofendonucleases according to the invention that cleave the locus.

Table F provides information about the nearest genes that are locatedimmediately upstream (at 5′) and downstream (at 3′) of the locusaccording to the invention. The distance indicates the distance betweenthe target sequence and the nearest coding sequence of the gene.

Locations of loci, targets in this loci and genes are given in Tables Eand F according to GRCh36.3/hg19 version of the human genome assembly.

TABLE E Target position on chromosome Human (start;Example of Target Sequence SEQ ID Name chromosome V36.3)Comprised within the locus: NO: SH101  3 72293606CCTACACCCTGTAAGATGGCTAGT 151 SH106 13 103230446 CTAAAATCATGTAAGTTGTATTAT152 SH107 13 103240747 TAAACATTTTGTACAGAATCTCAG 153 SH102  4 143846381ATGAGATAATGTACAAGGTTTTGT 154 SH105 12 64610385 CAGGGACTATTTACAAAAGATTGA155 SH103  4 143907910 CCAAACCTAGGTAAGAGATATGAA 156 SH104  7 131856646TATAGATCAAGTAACAAGTGTAAT 157 SH113  8 66935276 TTTTACTGTCTTACCTAGTTTTGC158 SH109  3 72674929 TCAATCTCACTTACAAAGTTGTGA 159 SH112  7 127627660CTAGGATGTAGTACAGGGTGCTAT 160 SH108  3 173734739 AATATCTCATGTAACACATATTGC161 SH110  5 14051421 TTACTCCCATTTACAAGAGCAGAG 162 SH114 10 11537739ACCAGACCTTGTAAGTTATACAGA 163 SH116 21 14663030 ATAAAATAAGTTACAGAGTTACAA164 SH111  7 127808719 ACTTCCTGTTTTACAAGGTGTAAT 165 SH115 12 95084648CCTGGATATGTTACAACAGAAAGC 166 SH121  8 8897353 TTTCTCTCAGGTAAAACAGTCCAC167 SH120  8 24344273 GTAAGCTATTGTAAGAAATGCAAG 168 SH122 17 58931643ATGAGATGATGTACAAAGTCCTAG 169 SH117  1 223618330 ACTGTATTTTGTAAAGTGTCCCTC170 SH118  4 8209666 TCTTCATGTTGTACCTTGTCCCCT 171 SH119  5 138660535ATCATCTGAGGTAAAGAGTTCTGA 172 SH123 19 40227362 GCTCTCTCTGGTACCTGATAGTGA173 SH126  2 194307577 ACAAACTCTTTTACGGGATTCAGG 174 SH128  2 193954229TTCACATGCTTTACGAAAGTTAGC 175 SH129  2 194043922 CCTACATTTCGTAAGACATCTATT176 SH124  4 159540469 GCAAACTGTGGTACCTAGGCCCGT 177 SH131  1 201630446TCGAGCCACTGTACCTAGTTTTGT 178 SH125 17 10025853 ACAGGATCCAGTAAAGGAGCCGGC179 SH127  2 20001992 GCTGTACTATTTACGGTATTCAAT 180 SH130 16 56151416ATAAACTTCGGTAAGACATCTCAA 181 RIS Cleaved position by mega- accordingnucleases to RIS RIS (examples) GRCh36.3/ Distance described of SEQ IDName hg19 (bases) in: NO: Plasmids SH101 72478871 185265Gabriel et al, 2009 182 pCLS7518 SH106 103311358 80912Gabriel et al, 2009 183 pCLS7523 SH107 103311358 70611Gabriel et al, 2009 184 pCLS7524 SH102 143708544 137837Gabriel et al, 2009 185 pCLS7519 SH105 64560662 49723Gabriel et al, 2009 186 pCLS7522 SH103 143708544 199366Gabriel et al, 2009 187 pCLS7520 SH104 131765633 91013Gabriel et al, 2009  188 pCLS7521 SH113 67019410 84134Gabriel et al, 2009  189 pCLS7530 SH109 72478871 196058Gabriel et al, 2009  190 pCLS7526 SH112 127698957 71297Gabriel et al, 2009  191 pCLS7529 SH108 173720808 13931Gabriel et al, 2009  192 pCLS7525 SH110 14197567 146146Gabriel et al, 2009  193 pCLS7527 SH114 11694871 157132Gabriel et al, 2009  194 pCLS7531 SH116 14814623 151593Gabriel et al, 2009  195 pCLS7533 SH111 127698957 109762Gabriel et al, 2009  196 pCLS7528 SH115 95131508 46860Gabriel et al, 2009  197 pCLS7532 SH121 8837115 60238Gabriel et al, 2009  198 pCLS7538 SH120 24200341 143932Gabriel et al, 2009  199 pCLS7537 SH122 59056021 124378Gabriel et al, 2009  200 pCLS7539 SH117 223700385 82055Gabriel et al, 2009  201 pCLS7534 SH118 8250751 41085Gabriel et al, 2009  202 pCLS7535 SH119 138751654 91119Gabriel et al, 2009  203 pCLS7536 SH123  40144506 82856Gabriel et al, 2009  204 pCLS7540 SH126 194148379 159198Gabriel et al, 2009  205 pCLS7543 SH128 194148379 194150Gabriel et al, 2009  206 pCLS7545 SH129 194148379 104457Gabriel et al, 2009  207 pCLS7546   208 pCLS7547 SH124 159391564 148905Gabriel et al, 2009  209 pCLS7541 SH131 201525001 105445Gabriel et al, 2009  210 pCLS7549 SH125 9964030 61823Gabriel et al, 2009 211 pCLS7542 SH127  20112551 110559Gabriel et al, 2009  212 pCLS7544 SH130  56136054 15362Gabriel et al, 2009  213 pCLS7548

TABLE F Dist Left Dist Right Name Left Gene1 Kb1 Right Gene1 Kb1 SH101PROK2 380 RYPB 213 SH106 SLC10A2 713 DAOA 1500 SH107 SLC10A2 724 DAOA1500 SH113 PDE7A 19 DNAJC5B 161 SH109 RYBP 96 SHQ1 208 SH112 SND1 100LEP 41 SH108 TNFSF10 11 AADACL1 96 SH110 DNAH5 54 TRIO 146 SH114 CUGBP2120 USP6NL 5 SH116 ABCC13 66 HSPA13 3 SH111 PRRT4 25 IMPDH1 11 SH115LTA4H 151 ELK3 27 SH121 MFHAS1 110 ERI1 0.37 SH120 ADAMDEC1 25 ADAM7 10SH122 ACE 3 KCNH6 24 SH126 TMEFF2 1500 SLC39A10 2000 SH128 TMEFF2 1400SLC39A10 2100 SH129 TMEFF2 1300 SLC39A10 2200 SH124 TMEM144 145 RXFP1122 SH131 FMOD 44 PRELP 81

The locus according to the invention may also correspond to any one ofthe SH125, SH127, SH130, SH102, SH105, SH103, SH104, SH117, SH118, SH119and SH123 which are described in Table G below.

Table G provides examples of target sequences located in introns ofgenes which are mentioned and examples of endonucleases according to theinvention that cleave said intronic locus.

TABLE G Example of Target Sequence Comprised within   Hit Namethe locus: position Gene Intron SH125 ACAGGATCCAGTAA intronic GAS7 1AGGAGCCGGC SH127 GCTGTACTATTTACG intronic WDR35 18 GTATTCAAT SH130ATAAACTTCGGTAAG intronic GPR114 1 ACATCTCAA SH102 ATGAGATAATGTACAintronic INPP4B 2 AGGTTTTGT SH105 CAGGGACTATTTACA intronic HMGA2 3AAAGATTGA SH103 CCAAACCTAGGTAA intronic INPP4B 1 GAGATATGAA SH104TATAGATCAAGTAAC intronic PLXNA4 1 AAGTGTAAT SH117 ACTGTATTTTGTAAAintronic DNAH14 76 GTGTCCCTC SH118 TCTTCATGTTGTACC intronic ABLIM2 1TTGTCCCCT SH119 ATCATCTGAGGTAAA intronic MATR3 5 GAGTTCTGA SH123GCTCTCTCTGGTAC intronic HPN 3 CTGATAGTGA

The locus according to the invention may also contains any one of theSH11, SH12, SH13, SH17, SH19, SH20, SH21, SH23, SH33, SH34, SH40, SH53,SH54, SH55, SH56, SH57, SH58, SH59, SH60, SH61, SH62, SH65, SH67, SH68and SH69 which are given in Tables H below.

Table H provides target sequences comprised within these loci as well asexamples of endonucleases according to the invention that cleave thesetarget sequences.

TABLE H Cleaved by SEQ meganucleases ID (examples) of Name Sequence NO:SEQ ID NO: Plasmids SH11 AGAAGCCCAGGTAAAACAGCCTGG 214 235 pCLS3895 236pCLS4664 SH12 ATAAAACAAGTCACGTTATTTTGG 215 237 pCLS3896 238 pCLS3915 239pCLS6445 SH13 ATTACACTCTTTAAGTGATTTTAA 216 240 pCLS3897 241 pCLS6446SH17 CTAGGCTGGATTACAGCGGCTTGA 217 242 pCLS3898 SH19GCAAAACATTGTAAGACATGCCAA 218 243 pCLS3899 244 pCLS7278 245 pCLS7279 SH20GCTGGCTGCTTCACATTGGAGAGA 219 246 pCLS3900 SH21 TAGAAATCTGTTAAAAGAGATGAT220 247 pCLS3901 248 pCLS4666 249 pCLS4667 SH23 TCAAACCATTGTACTCCAGCCTGG221 250 pCLS3902 251 pCLS6447 SH33 TTTTCATCACTTAAAGTGTTTTAA 222 252pCLS3905 253 pCLS4077 254 pCLS4668 255 pCLS4669 SH34TTTTCCTGTCTTACCAGGTTTTGT 223 256 pCLS3906 SH40 GTCTTCTGTCTTAAGACATAAAAT224 257 pCLS5427 258 pCLS5565 259 pCLS5566 SH53 GTAAAATGGATTAAAAGAGGGAAG225 260 pCLS4773 SH54 CCAAAACACGTTAAAAAAGTTTAA 226 261 pCLS4774 SH55ATAATATTCTGTGACTCATGGCAA 227 262 pCLS4775 SH56 AGTAGATCTTTTAAAAGATTTTAA228 263 pCLS4776 SH57 ATAAAACCACTTAAGACATAGGAA 229 264 pCLS4777 SH58ACTTGCTGTCTTAACAGAGAAGAT 230 265 pCLS4778 SH59 ATGTACCTCTTTAAAACAGATGAA231 266 pCLS4779 SH60 CTCTTCTCCTGTGACAGAGTTCTG 232 267 pCLS4780 SH61TCCAGCCCCTGTGACAGAGTGAGA 233 268 pCLS5333 SH62 ACAAAATATTTTAAGGGAGCCAAA234 269 pCLS5334 270 pCLS5335 SH65 CTCACCTGTCTCACAAGGGAGGGA 271 275pCLS5336 SH67 CTACTACCATGTGACTGGTTGTAG 272 276 pCLS5337 SH68GCTGCACGTTTTACATGAGAGTAA 273 277 pCLS5955 SH69 TCAGACTTCTTTACCTCATTTGAT274 278 pCLS5956

In a specific embodiment, the locus according to the invention is theSH3 locus. The term “SH3 locus” refers to the region of human chromosome6 that is located at about 120 kb centromeric to the gene encoding thelymphocyte antigen 86 (see e.g. the world wide web sitencbi.nlm.nih.gov/projects/mapview/maps.cgi?TAXID=9606&CHR=6&MAPS=ideogr%2Ccntg-r%2CugHs%2Cgenes&BEG=6432845&END=7232845&thmb=on, which shows the6,430K-7,230K region of chromosome 6), and to homologous regions inother species. More precisely, the SH3 locus extends from position6850510 to 6853677 of the sequence shown in NC 000006.11. It comprises asequence of SEQ ID NO: 54.

In another specific embodiment, the locus according to the invention isthe SH4 locus. The SH4 locus is defined herein as the region of humanchromosome 7 that is located at about 320 kb telomeric to MyoD familyinhibitor domain containing locus (MDFIC), or to the homologous regionin another species (see e.g. the world wide web sitencbi.nlm.nih.gov/projects/mapview/maps.cgi?TAXID=9606&CHR=7&MAPS=ideogr,cntg-r,ugHs,genes[113908811.00%3A114908811.00]&CMD=DN,which shows the 114,660K-115,660K region of chromosome 7). Moreprecisely, the SH4 locus extends from position 114972751 to 114976380 ofthe sequence shown in NC 000007.13. It comprises a sequence of SEQ IDNO: 55.

As used herein, the term “transgene” refers to a sequence encoding apolypeptide. Preferably, the polypeptide encoded by the transgene iseither not expressed, or expressed but not biologically active, in thecell, tissue or individual in which the transgene is inserted. Mostpreferably, the transgene encodes a therapeutic polypeptide useful forthe treatment of an individual.

In the frame of the present invention, the individual may be a human ornon-human animal. The individual is preferably a human. Alternatively,the individual can be a non-human animal, preferably a vertebrate and/ora mammalian animal such as e.g. a mouse, a rat, a rabbit, a Chinesehamster, a Guinea pig or a monkey. The cells and tissues according tothe invention are preferably derived from such human or non-humananimals.

Endonucleases According to the Invention that are Derived from I-CreI

The variant endonuclease according to the invention can for example bederived:

-   -   either from the wild-type I-CreI meganuclease, which is a        homodimeric protein comprising two monomers, each of these        monomers comprising a sequence of SEQ ID NO: 1 or the sequence        shown in shown SwissProt Accession n ° P05725;    -   or from a I-CreI meganuclease comprising two monomers, each of        these monomers comprising a sequence of SEQ ID NO: 42 Such a        I-CreI meganuclease, which recognizes the wild-type target        sequence, has been shown to be suitable for engineering        endonucleases with novel specificities.

Therefore, the invention pertains to a dimeric I-CreI protein comprisingor consisting of two monomers, each monomer comprising or consisting ofa sequence at least 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%,94%, 95%, 96%, 97%, 98% or 99% identical to SEQ ID NO: 1 or to SEQ IDNO: 42, wherein said dimeric I-CreI protein is capable of cleaving atarget sequence located within a safe harbor locus.

Preferably, the target sequence neither comprises nor consists of asequence of SEQ ID NO: 4.

Most preferably, the dimeric I-CreI protein according to the inventionis a heterodimeric protein.

By a protein having a sequence at least, for example, 95% “identical” toa query sequence of the present invention, it is intended that thesequence of the protein is identical to the query sequence except thatthe sequence may include up to five nucleotide mutations per each 100amino acids of the query sequence. In other words, to obtain a proteinhaving a sequence at least 95% identical to a query sequence, up to 5%(5 of 100) of the amino acids of the sequence may be inserted, deleted,or replaced with another nucleotide. The <<needle>> program, which usesthe Needleman-Wunsch global alignment algorithm (Needleman and Wunsch,1970 J. Mol. Biol. 48:443-453) to find the optimum alignment (includinggaps) of two sequences when considering their entire length, may forexample be used. The needle program is for example available on theebi.ac.uk world wide web site. The percentage of identity in accordancewith the invention can thus be calculated using the EMBOSS::needle(global) program with a “Gap Open” parameter equal to 10.0, a “GapExtend” parameter equal to 0.5, and a Blosum62 matrix.

Each monomer of the dimeric I-CreI protein according to the inventionmay for example comprise at least, at most or about 2, 5, 8, 10, 12, 15,18, 20 or 25 mutations compared with the sequence of a wild-type monomer(SEQ ID NO: 1) or with a monomer of SEQ ID NO: 42. In other terms, themonomer according to the invention comprises a sequence that differsfrom SEQ ID NO: 1 or SEQ ID NO: 42 by at least, at most or about 2, 5,8, 10, 12, 15, 18, 20, 25 or 30 mutations.

In the frame of the present invention, the mutation preferablycorresponds to a substitution of one amino acid with another amino acid.Therefore, a preferred embodiment according to the invention is directedto a dimeric I-CreI protein comprising or consisting of two monomerscomprising a sequence at least 80%, identical to SEQ ID NO: 1 or SEQ IDNO: 42, wherein said sequence only differs from SEQ ID NO: 1 or SEQ IDNO: 42 by the presence of amino acid substitutions.

The monomers of the dimeric I-CreI protein according to the inventionare preferably derived from monomers comprising or consisting of thesequence of SEQ ID NO: 42.

The mutations are preferably located at positions of the I-CreI sequencethat are involved in recognition of the target sequence. Indeed,introducing such mutations allow designing meganucleases with novelspecificities.

In addition to such mutations, the monomers may also have mutationscorresponding to:

-   -   mutations that improve the binding and/or the cleavage        properties of the protein towards the target site, such as e.g.        G195, G19A, F54L, S79G, E80K, F87L, V105A and/or I132V (see for        example WO 2008/152524); and/or    -   mutations leading to the obtention of an obligate heterodimer        (see for example WO 2008/093249 and Fajardo-Sanchez et al.,        Nucleic Acids Res. 2008 36:2163-73); and/or    -   mutations suitable for the generation of a fusion protein such        as, e.g., the deletion of the five most N-terminal amino acid        residues of SEQ ID NO: 1 in the C-terminal monomer of a fusion        protein; and/or    -   a mutation consisting of the insertion of an alanine between the        first and the second residue of SEQ ID NO: 1, as is the case in        a monomer of SEQ ID NO: 42.

In addition to the sequence homologous to SEQ ID NO: 1 or SEQ ID NO: 42,the monomers of the protein according to the invention may comprise oneor more amino acids added at the NH₂ terminus and/or COOH terminus ofthe sequence, such as a Tag useful in purification of the protein, apropeptide and/or a nuclear localization signal. In particular, themonomers of the protein according to the invention may comprise AADamino acids added at the COOH terminus of the sequence of SEQ ID NO: 1,as is the case in a monomer of SEQ ID NO: 42.

In the present specification, the mutations are indicated by theposition on SEQ ID NO: 1 followed by the nature of the amino acidreplacing the amino acid located at this position in SEQ ID NO: 1. Forexample, a monomer comprising a 44A mutation refers to a I-CreI monomerin which the amino acid at position 44 of SEQ ID NO: 1 (i.e. aglutamine, Q) is replaced with an alanine (A). Thus this monomer differsfrom the wild-type I-CreI monomer of SEQ ID NO: 1 by at least thefollowing amino acid substitution: Q44A. As explained hereabove, theI-CreI monomer of SEQ ID NO: 42 comprises some additional amino acidresidues compared to the I-CreI monomer of SEQ ID NO: 1 (see FIG. 11).Therefore, on SEQ ID NO: 42, the 44A mutation corresponds to areplacement of the glutamine at position 45 of SEQ ID NO: 42 with analanine.

For the purpose of illustration, a monomer comprising 44A 54L 64A 70Q75N 158R 162A mutations may for example have the sequence of SEQ ID NO:57 (when this monomer is directly derived from a I-CreI monomer of SEQID NO: 1) or the sequence of SEQ ID NO: 58 (when this monomer isdirectly derived from a I-CreI monomer of SEQ ID NO: 42). FIG. 12 showsan alignment between two such monomers, and indicates the position ofthe 44A 54L 64A 70Q 75N 158R and 162A mutations on these monomers.

Examples of dimeric I-CreI proteins according to the invention, capableof cleaving target sequences located in the SH3, SH4 or SH6 locus, arefurther described below.

Dimeric I-CreI Protein According to the Invention Capable of Cleavingthe SH3 Locus

In a preferred embodiment, the target sequence is located within the SH3locus (defined hereabove). The target sequence located within SH3 mayfor example comprise or consist of SEQ ID NO: 2, or of nucleotides 2 to23 of SEQ ID NO: 2. Example 1 discloses several examples ofheterodimeric I-CreI proteins according to the invention capable ofcleaving such a target sequence. In addition, methods for constructingother such proteins are well-known in the art and include e.g. thosedescribed in PCT applications WO 2006/097784, WO 2006/097853 and WO2009019614, and in Arnould et al. (J. Mol. Biol., 2006, 355:443-458).

The monomers of such a dimeric protein preferably comprise at least one,preferably at least 3, 4, 5 or 6, amino acid substitutions located at aposition selected from the group consisting of positions 4, 24, 26, 28,30, 32, 33, 38, 44, 50, 54, 57, 64, 66, 70, 71, 75, 77, 81, 86, 92, 105,142, 151, 154, 158 and 162 of SEQ ID NO: 1, preferably positions 4, 30,38, 44, 50, 54, 57, 64, 66, 70, 71, 75, 77, 81, 86, 92, 105, 142, 151,154, 158 and 162 of SEQ ID NO: 1. Said substitutions may for example beselected from the following substitutions: 4E, 30G, 38R, 44A, 50R, 54L,57E, 64A, 66C, 70Q, 70D, 71 R, 75N, 75Y, 77V, 81T, 86D, 92R, 105A, 142R,151A, 154G, 158R, 158W and 162A. The dimeric protein may optionallycomprise a mutation at position 1, however, such a mutation has noinfluence on cleavage activity or on cleavage specificity.

Such dimeric I-CreI proteins may for example comprise or consist of:

-   -   a first monomer comprising at least one amino acid substitution        compared to SEQ ID NO: 1, wherein said at least one amino acid        substitution is located at a position selected from the group        consisting of positions 30, 38, 50, 70, 75, 81, 86, 142 and 154        of SEQ ID NO: 1. Preferably, said first monomer comprises        substitutions at positions 30, 38, 70 and 75 of SEQ ID NO: 1.        Most preferably, said substitutions are selected from the        following substitutions: 30G, 38R, 50R, 70D, 75N, 81T, 86D, 142R        and 154G. Such a monomer may for example comprise at least 4, 5        or 6 mutations compared to SEQ ID NO: 1, and/or at most 4, 5, 6,        8, 10, 12 or 15 amino acid mutations compared to SEQ ID NO: 1;        and    -   a second monomer comprising at least one amino acid substitution        compared to SEQ ID NO: 1, wherein said at least one amino acid        substitution is located at a position selected from the group        consisting of positions 4, 44, 54, 57, 64, 66, 70, 71, 75, 77,        92, 105, 151, 158 and 162 of SEQ ID NO: 1. Preferably, said        second monomer comprises substitutions at positions 44, 54, 70        and 75 of SEQ ID NO: 1. Most preferably, said substitutions are        selected from the following substitutions: 4E, 44A, 54L, 57E,        64A, 66C, 70Q, 71R, 75N, 75Y, 77V, 92R, 105A, 151A 158R, 158W        and 162A. Such a monomer may for example comprise at least 4, 5        or 6 mutations compared to SEQ ID NO: 1, and/or at most 4, 6, 8,        10, 12 or 15 amino acid mutations compared to SEQ ID NO: 1.

In a specific embodiment, the dimeric I-CreI protein according theinvention comprises or consists of:

-   -   a) a first monomer comprising 30G 38R 70D 75N 86D mutations;    -   b) a second monomer selected from the group consisting of:        -   i. a monomer comprising 44A 54L 64A 70Q 75N 158R 162A            mutations;        -   ii. a monomer comprising 44A 54L 70Q 75Y 92R 158R 162A            mutations;        -   iii. a monomer comprising 4E 44A 54L 64A 70Q 75N 158R 162A            mutations;        -   iv. a monomer comprising 44A 54L 64A 70Q 75N 158W 162A            mutations;        -   v. a monomer comprising 44A 54L 70Q 75N mutations;        -   vi. a monomer comprising 44A 54L 57E 70Q 75N 158R 162A            mutations; and        -   vii. a monomer comprising 44V 54L 70Q 75N 77V mutations;

In another specific embodiment, the dimeric I-CreI protein according theinvention comprises or consists of:

-   -   a) a first monomer comprising 30G 38R 70D 75N 81T 154G        mutations;    -   b) a second monomer selected from the group consisting of:        -   i. a monomer comprising 44A 54L 70Q 75N 105A 158R 162A            mutations;        -   ii. a monomer comprising 44A 54L 64A 70Q 75N 158R 162A            mutations;        -   iii. a monomer comprising 4E 44A 54L 64A 70Q 75N 158R 162A            mutations;        -   iv. a monomer comprising 44A 54L 64A 70Q 75N 158W 162A            mutations;        -   v. a monomer comprising 44A 54L 70Q 75N mutations; and        -   vi. a monomer comprising 44V 54L 70Q 75N 77V mutations;

In still another specific embodiment, the dimeric I-CreI proteinaccording the invention comprises or consists of:

-   -   a) a first monomer comprising 30G 38R, 50R 70D 75N 142R        mutations;    -   b) a second monomer selected from the group consisting of:        -   i. a monomer comprising 44A 54L 70Q 75N 105A 158R 162A            mutations;        -   ii. a monomer comprising 44A 54L 64A 70Q 75N 158R 162A            mutations;        -   iii. a monomer comprising 44A 54L 70Q 75Y 92R 158R 162A            mutations;        -   iv. a monomer comprising 4E 44A 54L 64A 70Q 75N 158R 162A            mutations;        -   v. a monomer comprising 44A 54L 64A 70Q 75N 158W 162A            mutations;        -   vi. a monomer comprising 44A 54L 66C 70Q 71R 75N 151A 158R            162A mutations;        -   vii. a monomer comprising 44A 54L 70Q 75N mutations;        -   viii. a monomer comprising 44A 54L 57E 70Q 75N 158R 162A            mutations; and        -   ix. a monomer comprising 44V 54L 70Q 75N 77V mutations.

The monomers of the dimeric I-CreI protein may also comprise additionalmutations, for example allowing the obtention of an obligateheterodimer. Such mutations are known to the skilled in the art andinclude those described in Fajardo-Sanchez et al. (Nucleic Acids Res.2008 36:2163-73).

In a specific embodiment, the above monomers are directly derived from amonomer of SEQ ID NO: 42, and differ from the sequence of SEQ ID NO: 42only by the presence of the indicated mutations.

Dimeric I-CreI Protein According to the Invention Capable of Cleavingthe SH4 Locus

In a preferred embodiment, the target sequence is located within the SH4locus (defined hereabove). The target sequence located within SH4 mayfor example comprise or consist of SEQ ID NO: 3, or of nucleotides 2 to23 of SEQ ID NO: 3. Example 2 discloses several examples of dimericI-CreI proteins according to the invention capable of cleaving such atarget sequence.

The monomers of such a dimeric protein preferably comprise at least one,preferably at least 3, 4, 5 or 6, amino acid substitutions located at aposition selected from the group consisting of positions 24, 44, 68, 70,75 and 77 of SEQ ID NO: 1. Said substitutions may for example beselected from the following substitutions: 24V, 44R, 44Y, 68Y, 68A, 70S,70D, 75Y, 75N, 77R, 77N and 77V.

Such dimeric I-CreI proteins may for example comprise or consist of:

-   -   a first monomer comprising at least one amino acid substitution        compared to SEQ ID NO: 1, wherein said at least one amino acid        substitution is located at a position selected from the group        consisting of positions 24, 44, 68, 70, 75 and 77 of SEQ ID        NO: 1. Preferably, the first monomer comprises substitutions at        positions 24, 70, 75 and 77 of SEQ ID NO: 1. Most preferably,        said substitutions are selected from the following        substitutions: 24V, 44R, 68Y, 68A, 70D, 70S, 75Y, 75N, 77N and        77R. Such a monomer may for example comprise at least 4, 5 or 6        mutations compared to SEQ ID NO: 1, and/or at most 4, 5, 6, 8,        10, 12 or 15 amino acid mutations compared to SEQ ID NO: 1; and    -   a second monomer comprising at least one amino acid substitution        compared to SEQ ID NO: 1, wherein said at least one amino acid        substitution is located at a position selected from the group        consisting of positions 24, 44, 70 and 77 of SEQ ID NO: 1.        Preferably, the second monomer comprises substitutions at        positions 24, 44 and 70 of SEQ ID NO: 1. Most preferably, said        substitutions are selected from the following substitutions:        24V, 44Y, 70S and 77V. Such a monomer may for example comprise        at least 3 or 4 mutations compared to SEQ ID NO: 1, and/or at        most 3, 4, 6, 8, 10, 12 or 15 amino acid mutations compared to        SEQ ID NO: 1.

In a specific embodiment, the dimeric I-CreI protein according theinvention comprises or consists of:

-   -   a) a first monomer selected from the group consisting of:        -   i. a monomer comprising 24V 44R 68Y 70S 75Y 77N mutations;        -   ii. a monomer comprising 24V 68A 70S 75N 77R mutations; and        -   iii. a monomer comprising 24V 70D 75N 77R mutations;    -   b) a second monomer selected from the group consisting of:        -   i. a monomer comprising 24V 44Y 70S mutations; and        -   ii. a monomer comprising 24V 44Y 70S 77V mutations.

The monomers of the dimeric I-CreI protein may also comprise additionalmutations, for example allowing the obtention of an obligateheterodimer. Such mutations are known to the skilled in the art andinclude those described in Fajardo-Sanchez et al. (Nucleic Acids Res.2008 36:2163-73).

In a specific embodiment, the above monomers are directly derived from amonomer of SEQ ID NO: 42, and differ from the sequence of SEQ ID NO: 42only by the presence of the indicated mutations.

Dimeric I-CreI Protein According to the Invention Capable of Cleavingthe SH6 Locus

In a preferred embodiment, the target sequence is located within the SH6locus (defined hereabove). The target sequence located within SH6 mayfor example comprise or consist of SEQ ID NO: 59, or of nucleotides 2 to23 of SEQ ID NO: 59. Example 5 discloses several examples of dimericI-CreI proteins according to the invention capable of cleaving such atarget sequence.

The monomers of such a dimeric protein preferably comprise at least one,preferably at least 3, 4, 5 or 6, amino acid substitutions located at aposition selected from the group consisting of positions 7, 24, 27, 28,34, 40, 44, 68, 70, 75, 77, 81, 85, 96, 99, 103, 108, 111, 121, 132, 144and 160 of SEQ ID NO: 1. Said substitutions may for example be selectedfrom the following substitutions: 7R, 24F, 27V, 28Q, 34R, 40R, 44A, 44K,68T, 70L, 70G, 70S, 75N, 77V, 81T, 81V, 85R, 96R, 99R, 103T, 103S, 108V,111H, 121E, 132V, 144S, 160R and 160E.

Such dimeric I-CreI proteins may for example comprise or consist of:

-   -   a first monomer comprising at least one amino acid substitution        compared to SEQ ID NO: 1, wherein said at least one amino acid        substitution is located at a position selected from the group        consisting of positions 7, 24, 27, 28, 34, 40, 44, 70, 75, 77,        81, 85, 96, 99, 103, 108, 111, 121, 132, 144 and 160 of SEQ ID        NO: 1. Preferably, the first monomer comprises substitutions at        positions 28, 40, 44, 70 and 75 of SEQ ID NO: 1. Most        preferably, said substitutions are selected from the following        substitutions: 7R, 24F, 27V, 28Q, 34R, 40R, 44A, 70L, 75N, 77V,        81T, 81V, 85R, 96R, 99R, 103T, 103S, 108V, 111H, 121E, 132V,        144S and 160R et 160E. Such a monomer may for example comprise        at least 5 or 6 mutations compared to SEQ ID NO: 1, and/or at        most 5, 6, 8, 10, 12, 15 or 20 amino acid mutations compared to        SEQ ID NO: 1; and    -   a second monomer comprising at least one amino acid substitution        compared to SEQ ID NO: 1, wherein said at least one amino acid        substitution is located at a position selected from the group        consisting of positions 44, 68, 70 and 75 of SEQ ID NO: 1.        Preferably, the second monomer comprises substitutions at        positions 44, 70 and 75 of SEQ ID NO: 1. Most preferably, said        substitutions are selected from the following substitutions:        44K, 68T, 70G, 70S and 75N. Such a monomer may for example        comprise at least 3 or 4 mutations compared to SEQ ID NO: 1,        and/or at most 3, 4, 6, 8, 10, 12 or 15 amino acid mutations        compared to SEQ ID NO: 1.

In a specific embodiment, the dimeric I-CreI protein according theinvention comprises or consists of:

-   -   a) a first monomer comprising 44K 68T 70G 75N mutations; and    -   b) a second monomer selected from the group consisting of:        -   i. a monomer comprising 28Q 40R 44A 70L 75N 96R 111H 144S            mutations;        -   ii. a monomer comprising 7R 28Q 40R 44A 70L 75N 85R 103T            mutations;        -   iii. a monomer comprising 28Q 40R 44A 70L 75N 103S            mutations;        -   iv. a monomer comprising 24F 27V 28Q 40R 44A 70L 75N 99R            mutations;        -   v. a monomer comprising 7R 28Q 40R 44A 70L 75N 81T            mutations;        -   vi. a monomer comprising 7R 28Q 40R 44A 70L 75N 77V            mutations;        -   vii. a monomer comprising 7R 28Q 40R 44A 70L 75N 103T 121E            132V 160R mutations;        -   viii. a monomer comprising 28Q 40R 44A 70L 75N mutations;        -   ix. a monomer comprising 7R 28Q 40R 44A 70L 75N 103T            mutations; and        -   x. a monomer comprising 28Q 34R, 40R 44A 70L 75N 81V 103T            108V 160E mutations.

In another specific embodiment, the dimeric I-CreI protein according theinvention comprises or consists of:

-   -   a) a first monomer comprising 44K 70S 75N mutations; and    -   b) a second monomer selected from the group consisting of:        -   i. a monomer comprising 28Q 40R 44A 70L 75N 96R 111H 144S            mutations;        -   ii. a monomer comprising 7R 28Q 40R 44A 70L 75N 85R 103T            mutations;        -   iii. a monomer comprising 28Q 40R 44A 70L 75N 103S            mutations;        -   iv. a monomer comprising 24F 27V 28Q 40R 44A 70L 75N 99R            mutations;        -   v. a monomer comprising 7R 28Q 40R 44A 70L 75N 81T            mutations;        -   vi. a monomer comprising 7R 28Q 40R 44A 70L 75N 103T 121E            132V 160R mutations;        -   vii. a monomer comprising 7R 28Q 40R 44A 70L 75N 103T            mutations; and        -   viii. a monomer comprising 28Q 34R, 40R 44A 70L 75N 81V 103T            108V 160E mutations.

The monomers of the dimeric I-CreI protein may also comprise additionalmutations, for example allowing the obtention of an obligateheterodimer. Such mutations are known to the skilled in the art andinclude those described in Fajardo-Sanchez et al. (Nucleic Acids Res.2008 36:2163-73).

In a specific embodiment, the above monomers are directly derived from amonomer of SEQ ID NO: 42, and differ from the sequence of SEQ ID NO: 42only by the presence of the indicated mutations.

Fusion Proteins According to the Invention

Fusion proteins comprising the two monomers of a dimeric I-CreI proteinfused together and retaining the biological activity of the parentdimeric I-CreI protein can be constructed (Grizot et al. NAR 200937:5405; Li et al. Nucleic Acids Res. 2009 37:1650-62; Epinat et al.Nucleic Acids Res. 2003 31:2952-62). Such fusion proteins are commonlyreferred to as “single-chain meganucleases”.

Therefore, the invention further relates to a fusion protein comprisingthe two monomers of the dimeric I-CreI protein as defined hereabove, orbiologically active fragments of such monomers. In such a fusionprotein, the first and second monomers of a dimeric I-CreI protein asdefined hereabove are fused together and are optionally connected toeach other by a linker such as a peptidic linker. The linker may forexample comprise or consist of SEQ ID NO: 43 or SEQ ID NO: 326.

In the frame of the present invention, it is understood that such afusion protein according to the invention is capable of cleaving atarget sequence according to the invention, i.e., it is capable ofcleaving the same target sequence as the dimeric I-CreI protein fromwhich it is derived. The single chain meganuclease of the presentinvention further comprises obligate heterodimer mutations as describedabove so as to obtain single chain obligate heterodimer meganucleasevariants.

In the first version of I-CreI single chain (Epinat et al. NAR 20033:2952-2962; WO 03/078619), the N-terminal monomer of the single-chainmeganuclease consisted essentially of positions 1 to 93 of I-CreI aminoacid sequence whereas the C-terminal (positions 8 to 163 of I-CreI aminoacid sequence) was a nearly complete I-CreI monomer. More recently, anew way to design a single chain molecule derived from the I-CreIhomodimeric meganuclease consisted in two nearly complete C-terminal andN-terminal I-CreI monomers (see, e.g. WO 2009/095793). This designgreatly decreases off-site cleavage and toxicity while enhancingefficacy. The structure and stability of this single-chain molecule arevery similar to those of the dimeric variants and this molecule appearsto be monomeric in solution. In all respects, this single-chain moleculeperforms as well as I-SceI considered to be gold standard in terms ofspecificity. These properties place this new generation of meganucleasesamong the best molecular scissors available for genome surgerystrategies and should facilitate gene correction therapy for monogeneticdiseases, such as for example severe combined immunodeficiency (SCID),while potentially avoiding the deleterious effects of previous genetherapy approaches.

In addition to the mutations described hereabove, additional mutationsmay be introduced into the sequence of each of the two monomers of thefusion protein. For example, the C-terminal monomer may comprise the K7Eand K96E mutations, and the N-terminal monomer may comprise the E8K, E61R and G19S mutations.

Examples 1, 2 and 5 disclose several examples of such fusion proteinsaccording to the invention.

In a specific embodiment, the fusion protein according to the inventioncomprises or consists of a sequence at least 80%, 85%, 90%, 95%, 96%,97%, 98%, 99% or 100% identical to any one of SEQ ID Nos. 25-40 and76-96, or to a fragment of at least 50, 100, 150 or 200 amino acidsthereof.

Nucleic Acids, Vectors and Combinations According to the Invention

When inserting a transgene into the genome of a cell, tissue or animal,the endonuclease according to the invention is preferably introduced tosaid cell, tissue or animal as a nucleic acid molecule rather than as aprotein.

Therefore, the invention pertains to a nucleic acid encoding theendonuclease according to the invention, e.g. encoding a dimeric I-CreIprotein or a fusion protein described hereabove. When the endonucleaseis a dimeric I-CreI protein, said nucleic acid comprises at least twocoding sequences, one for each monomer. When the endonuclease is afusion protein, said nucleic acid comprises at least one codingsequence. The endonuclease protein can be combined with a variety ofcell-penetrating peptide leading to a recombinant protein; such combinedmolecules are able to enter target cells at much higher levels ofefficiency than the endonuclease alone. These cell-penetrating peptideswere developed by Diatos S. A. (WO01/64738; WO05/016960; WO03/018636;WO05/018650; WO07/069,068). The applicant has previously shown thatendonuclease cell-penetrating peptides combinations can enter targetcells efficiently and that the internalized endonuclease can act uponthe target cell genome so as to generate a DSB and in turn stimulate ahomologous recombination event. The applicant has shown that the complexthree dimensional structure of the endonuclease is not affected by thepresence of the cell-penetrating peptide and that the all importantspecificity of the endonuclease also remains unaffected (data notshown).

Another aspect of the invention is a vector comprising such a nucleicacid according to the invention. By “vector” is meant a nucleic acidmolecule capable of transporting another nucleic acid to which it hasbeen linked.

Vectors which can be used in the present invention includes but is notlimited to viral vectors, plasmids and YACs, which may consist ofchromosomal, non chromosomal, semisynthetic or synthetic nucleic acids.Preferred vectors are those capable of autonomous replication (episomalvector) and/or expression of nucleic acids to which they are linked(expression vectors). Large numbers of suitable vectors are known tothose of skill in the art and commercially available.

In a preferred embodiment, the vector is a viral vector such as e.g. avector derived from a retrovirus, an adenovirus, a parvovirus (e.g. anadeno-associated viruses), a coronavirus, a negative strand RNA virus(e.g. an orthomyxovirus such as influenza virus, a rhabdovirus such asrabies and vesicular stomatitis virus, a paramyxovirus such as measlesand Sendai virus), a positive strand RNA virus such as picornavirus andalphavirus, or a double-stranded DNA virus such as adenovirus,herpesvirus (e.g. Herpes Simplex virus types 1 and 2, Epstein-Barrvirus, cytomegalovirus) and poxvirus (e.g. vaccinia, fowlpox andcanarypox). Preferred vectors include lentiviral vectors, andparticularly self-inactivacting lentiviral vectors.

In addition to the sequence coding for the endonuclease according to theinvention, the vector can also comprise elements such as:

-   -   transcriptional and translational control elements such as        promoters, enhancers, polyadenylation sites, terminations        signals, introns, etc.;    -   a multiple cloning site;    -   a replication origin;    -   selection markers;    -   a transgene; and/or    -   a targeting construct comprising sequences sharing homologies        with the region surrounding the genomic target site as defined        herein.

In a preferred embodiment, said vector is an “expression vector”, i.e. avector in which at least one coding sequence is operatively linked totranscriptional and translational control elements. In the frame of thisembodiment, the nucleic acid encoding the endonuclease according to theinvention (e.g. encoding the dimeric I-CreI protein or the fusionprotein described hereabove) is operatively linked to transcriptionaland translational control elements.

In a preferred embodiment, the vector according to the inventioncomprises a targeting construct comprising a transgene and two sequenceshomologous to the genomic sequence flanking the target sequence asdefined herein (e.g. the target sequence of SEQ ID NO: 2 or 3). Thegenomic sequences flanking the target sequence are preferablyimmediately adjacent to the target site.

Such targeting constructs are well-known to the skilled in the art. Forinsertion of a transgene, such constructs typically comprise a firstsequence that is homologous to the upstream (5′) genomic sequenceflanking the target sequence, the transgene to be inserted, and a secondfragment that is homologous to the downstream (3′) genomic sequenceflanking the target sequence.

By “homologous” is intended a sequence with enough identity to anotherone to lead to a homologous recombination between sequences, moreparticularly having at least 95% identity, preferably 97% identity andmore preferably 99% identity to each other.

Preferably, homologous sequences of at least 50 bp, preferably more than100 bp and more preferably more than 200 bp are used. Therefore, thetargeting DNA construct is preferably from 200 pb to 6000 pb, morepreferably from 1000 pb to 2000 pb. Indeed, shared DNA homologies arelocated in regions flanking upstream and downstream the site of thebreak and the DNA sequence to be introduced should be located betweenthe two arms.

The targeting construct may also comprise a positive selection markerbetween the two homology arms and eventually a negative selection markerupstream of the first homology arm or downstream of the second homologyarm. The marker(s) allow(s) the selection of cells having inserted thesequence of interest by homologous recombination at the target site.

Methods for constructing targeting constructs suitable for inserting atransgene into the SH3 or SH4 locus are given in Example 4.

The nucleic acid encoding the endonuclease according to the inventionand the targeting construct can also be located on two separate vectors.Therefore, the invention also pertains to a combination of two vectors,namely:

-   -   an expression vector according the invention; and    -   a vector comprising a targeting construct comprising a transgene        and two sequences homologous to the genomic sequence of the        target sequence according to the invention.

Pharmaceutical Uses According to the Invention

The vectors and combinations described hereabove can for example be usedas a medicament. In particular, these vectors and combinations can beused in gene therapy.

Therefore, the invention relates to a vector or combination according tothe invention for use as a medicament. In such vectors and combinations,the transgene encodes a therapeutic polypeptide.

In particular, diseases that may be treated by gene therapy using thevectors and combinations according to the invention include but are notlimited to X-SCID, SCID, epidermolysis bullosa, leber amaurosis,hemophilia, thalassemia, fanconi anemia and muscular dystrophy.

In these diseases, the transgene encodes the following therapeuticpolypeptides, respectively: IL2RG, GI7A1, Rp 65, Blood factors VIII andIX, haemoglobin A and B, Fanc-A, Fanc-C (or other Fanconi Anemia relatedgenes), Dystrophine.

The invention further relates to a pharmaceutical composition comprisingthe vectors and combinations according to the invention and apharmaceutically active carrier.

The invention also relates to a method of treating an individual by genetherapy comprising administering an effective amount of a vector orcombination according to the invention to an individual in need thereof.

By “effective amount” is meant an amount sufficient to achieve insertionof the transgene into the genome of the individual to be treated. Suchconcentrations can be routinely determined by those of skilled in theart.

By “subject in need thereof” is meant an individual suffering from orsusceptible of suffering from a genetic disease that can be treated orprevented by insertion of the transgene. The individuals to be treatedin the frame of the invention are preferably human beings.

Non Pharmaceutical Uses According to the Invention

The vectors and combinations described hereabove not only find use ingene therapy but also in non pharmaceutical uses such as, e.g.,production of animal models and production of recombinant cell linesexpressing a protein of interest.

Therefore, the invention relates to:

-   -   the use of an endonuclease, nucleic acid, expression vector or        combination according to the invention for inserting a transgene        into the genome of a cell, tissue or non-human animal, wherein        said use is not therapeutic.    -   a method of inserting a transgene into the genome of a cell,        tissue or non-human animal, comprising the step of bringing said        cell, tissue or non-human animal in contact with an        endonuclease, nucleic acid, expression vector or combination        according to the invention, thereby inserting said transgene        into said genome.

In a preferred embodiment, the above use or method aims at inserting atransgene encoding a protein of interest into the genome of a cell orderto obtain a recombinant cell line for protein production. Suitable cellsfor constructing recombinant cell lines for protein production includebut are not limited to human (e.g. PER.C6 or HEK), Chinese Ovary hamster(CHO) and mouse (NSE0) cells.

In another preferred embodiment, the above use aims at making anon-human animal model of a hereditary disorder.

The invention is also directed to a non-human transgenic animalcomprising a nucleic acid, an expression vector or a combinationaccording to the invention in its genome.

All references cited herein, including journal articles or abstracts,published patent applications, issued patents or any other references,are entirely incorporated by reference herein, including all data,tables, figures and text presented in the cited references.

The invention will be further evaluated in view of the followingexamples and figures.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 represents target sequences of meganucleases described in Example1.

FIGS. 2 and 3 represent SCOH SH3 meganucleases vs. I-SceI and SCOH-RAGDNA dose response in CHO.

FIG. 4 represents target sequences of meganucleases described in Example2.

FIGS. 5 and 6 represent SCOH SH4 meganucleases vs. I-SceI and SCOH-RAGDNA dose response in CHO.

FIG. 7 represents a scheme of the mechanism leading to the generation ofsmall deletions and insertions (InDel) during repair of double-strandbreak by non homologous end-joining (NHEJ).

FIG. 8 represents the insertion sites upon cleavage with SH3 or SH4meganucleases.

FIG. 9 represents target sequences of meganucleases described in Example5.

FIG. 10 represents SCOH SH6 meganucleases vs. I-SceI and SCOH-RAG DNAdose response in CHO.

FIG. 11 represents a sequence alignment between a I-CreI monomer of SEQID NO: 1 and a I-CreI monomer of SEQ ID NO: 42.

FIG. 12 represents a sequence alignment between a I-CreI monomer of SEQID NO: 1 and two I-CreI monomers comprising 44A 54L 64A 70Q 75N 158R and162A mutations. The first one (SEQ ID NO: 57) is directly derived fromSEQ ID NO: 1 and the second one (SEQ ID NO: 58) is directly derived fromSEQ ID NO: 42.

FIGS. 13 to 17 illustrate examples 6 to 9.

BRIEF DESCRIPTION OF THE SEQUENCES

SEQ ID NO: 1 shows the amino acid sequence of a wild-type I-CreImonomer.

SEQ ID NO: 2 shows the sequence of a target sequence according to theinvention that is located within the SH3 locus.

SEQ ID NO: 3 shows the sequence of a target sequence according to theinvention that is located within the SH4 locus.

SEQ ID NO: 4 shows the sequence of the target sequence of the wild-typeI-CreI homodimeric protein.

SEQ ID Nos. 5 to 10 represent sequences shown on FIG. 1.

SEQ ID Nos. 11 to 15 represent oligonucleotides, primers and linkersused in Example 1.

SEQ ID Nos. 16 to 19 represent sequences shown on FIG. 4.

SEQ ID Nos. 20 to 24 represent oligonucleotides, primers and linkersused in Example 2.

SEQ ID Nos. 25 to 32 represent the single-chain meganucleasesconstructed in Example 1, referred to as SCOH-SH3-b56-A, SCOH-SH3-b56-B,SCOH-SH3-b56-C, SCOH-SH3-b56-D, SCOH-SH3-b1-A, SCOH-SH3-b1-B,SCOH-SH3-b1-C and SCOH-SH3-b1-D respectively.

SEQ ID Nos. 33 to 40 represent the single-chain meganucleasesconstructed in Example 2, referred to as SCOH-SH4-b56-A, SCOH-SH4-b56-B,SCOH-SH4-b56-C, SCOH-SH4-b56-D, SCOH-SH4-b1-A, SCOH-SH4-b1-B,SCOH-SH4-b1-C and SCOH-SH4-b1-D respectively.

SEQ ID NO: 41 represents the positive control SCOH-RAG.

SEQ ID NO: 42 shows the amino acid sequence of a I-CreI monomer with anadditional alanine at position 2, and with three additional residuesafter the final proline.

SEQ ID NO: 43 shows the amino acid sequence of the RM2 linker.

SEQ ID Nos. 44 to 49 represent oligonucleotides, primers and linkersused in Example 3.

SEQ ID Nos. 50 to 53 represent oligonucleotides, primers and linkersused in Example 4.

SEQ ID Nos. 54 to 55 show sequences comprised in the SH3, SH4 and SH6loci, respectively.

SEQ ID NO: 57 shows a monomer derived from a monomer of SEQ ID NO: 1that comprises 44A 54L 64A 70Q 75N 158R 162A mutations.

SEQ ID NO: 58 shows a monomer derived from a monomer of SEQ ID NO: 42that comprises 44A 54L 64A 70Q 75N 158R 162A mutations.

SEQ ID NO: 59 shows the sequence of a target sequence according to theinvention that is located within the SH6 locus.

SEQ ID Nos. 60 to 64 represent sequences shown on FIG. 9.

SEQ ID Nos. 65 to 75 represent oligonucleotides, primers and linkersused in Example 5.

SEQ ID Nos. 76 to 85 represent the single-chain meganucleasesconstructed in Example 5, referred to as SCOH-SH6-b1-B, SCOH-SH6-b1-C,SCOH-SH6-b1-C, QCSH61-A01, QCSH61-E01, QCSH61-H0, QCSH62-A02,QCSH61-H01b, QCSH61-H01c and QCSH61-H01d respectively.

SEQ ID Nos. 86 to 96 represent the single-chain meganucleases capable ofcleaving the SH7 locus (SEQ ID Nos. 86 and 87), SH8 locus (SEQ ID NO:88), the SH12 locus (SEQ ID NO: 89), the SH13 locus (SEQ ID NO: 90), theSH19 locus (SEQ ID NO: 91), the SH20 locus (SEQ ID NO: 92), the SH21locus (SEQ ID Nos. 93 to 95) and the SH33 locus (SEQ ID NO: 96).

SEQ ID Nos. 97 to 104 represent sequences comprised within the SH12,SH13, SH19, SH20, SH21, SH33, SH7 and SH8 loci, respectively.

SEQ ID Nos. 105 to 325 represent sequences disclosed in Examples 6 to 9and/or in any one of Tables A′, A″, E, G and H.

SEQ ID NO: 326 shows the amino acid sequence of the BQY linker.

EXAMPLES

In the following examples, all the I-CreI variants were constructed bygenetic engineering of I-CreI monomers of SEQ ID NO: 42.

Example 1 Engineering Meganucleases Targeting the SH3 Locus

SH3 is a locus comprising a 24 bp non-palindromic target (SEQ ID NO: 2)that is present on chromosome 6. As shown in Table A, SH3 is located inthe vicinity of a RIS disclosed in Deichmann et al. (J. of Clin. Invest.2007 117:2225). The SH3 sequence is not included in any of the CISdescribed in Deichmann et al.

I-CreI heterodimers capable of cleaving a target sequence of SEQ ID NO:2 were identified using methods derived from those described in Chameset al. (Nucleic Acids Res., 2005, 33, e178), Arnould et al. (J. Mol.Biol., 2006, 355, 443-458), Smith et al. (Nucleic Acids Res., 2006, 34,e149), Arnould et al. (Arnould et al. J Mol. Biol. 2007 371:49-65). Someof these heterodimers were then cloned into mammalian expression vectorsfor assessing SH3 cleavage in CHO cells. These results were thenutilized to design single-chain meganucleases directed against thetarget sequence of SEQ ID NO: 2. These single-chain meganucleases werecloned into mammalian expression vectors and tested for SH3 cleavage inCHO cells. Strong cleavage activity of the SH3 target could be observedfor these single chain molecules in mammalian cells.

Example 1.1. Identification of Meganucleases Cleaving SH3

I-CreI variants potentially cleaving the SH3 target sequence inheterodimeric form were constructed by genetic engineering. Pairs ofsuch variants were then co-expressed in yeast. Upon co-expression, oneobtains three molecular species, namely two homodimers and oneheterodimer. It was then determined whether the heterodimers werecapable of cutting the SH3 target sequence of SEQ ID NO: 2.

a) Construction of Variants of the I-CreI Meganuclease CleavingPalindromic Sequences Derived from the SH3 Target Sequence

The SH3 sequence is partially a combination of the 10AAT_P (SEQ ID NO:5), 5AAG_P (SEQ ID NO: 6), 10AGG_P (SEQ ID NO: 7) and 5TTT_P (SEQ ID NO:8) target sequences which are shown on FIG. 1. These sequences arecleaved by mega-nucleases obtained as described in International PCTapplications WO 2006/097784 and WO 2006/097853, Arnould et al. (J. Mol.Biol., 2006, 355, 443-458) and Smith et al. (Nucleic Acids Res., 2006).Thus, SH3 should be cleaved by combinatorial variants resulting fromthese previously identified meganucleases.

Two palindromic targets, SH3.3 and SH3.4, were derived from SH3 (FIG.1). Since SH3.3 and SH3.4 are palindromic, they should be cleaved byhomodimeric proteins. Therefore, homodimeric I-CreI variants cleavingeither the SH3.3 palindromic target sequence of SEQ ID NO: 9 or theSH3.4 palindromic target sequence of SEQ ID NO: 10 were constructedusing methods derived from those described in Chames et al. (NucleicAcids Res., 2005, 33, e178), Arnould et al. (J. Mol. Biol., 2006, 355,443-458), Smith et al. (Nucleic Acids Res., 2006, 34, e149) and Arnouldet al. (Arnould et al. J Mol Biol. 2007 371:49-65).

b) Construction of Target Vector

An oligonucleotide of SEQ ID NO: 11, corresponding to the SH3 targetsequence flanked by gateway cloning sequences, was ordered from PROLIGO.This oligo has the following sequence:TGGCATACAAGTTTCCAATACAAGGTACAAAGTCCTGACAATCGTCTGTCA). Double-strandedtarget DNA, generated by PCR amplification of the single strandedoligonucleotide, was cloned into the pCLS1055 yeast reporter vectorusing the Gateway protocol (INVITROGEN).

Yeast reporter vector was transformed into the FYBL2-7B Saccharomycescerevisiae strain having the following genotype: MAT a, ura3Δ851,trp1Δ63, leu2Δ1, lys2A202. The resulting strain corresponds to areporter strain (MilleGen).

c) Co-Expression of Variants

The open reading frames coding for the variants cleaving the SH3.4 orthe SH3.3 sequence were cloned in the pCLS542 expression vector and inthe pCLS1107 expression vector, respectively. Yeast DNA from thesevariants was extracted using standard protocols and was used totransform E. coli. The resulting plasmids were then used to co-transformyeast. Transformants were selected on synthetic medium lacking leucineand containing G418.

d) Mating of Meganucleases Coexpressing Clones and Screening in Yeast

Mating was performed using a colony gridder (QpixII, Genetix). Variantswere gridded on nylon filters covering YPD plates, using a low griddingdensity (4-6 spots/cm²). A second gridding process was performed on thesame filters to spot a second layer consisting of differentreporter-harboring yeast strains for each target. Membranes were placedon solid agar YPD rich medium, and incubated at 30° C. for one night, toallow mating. Next, filters were transferred to synthetic medium,lacking leucine and tryptophan, adding G418, with galactose (2%) as acarbon source, and incubated for five days at 37° C., to select fordiploids carrying the expression and target vectors. After 5 days,filters were placed on solid agarose medium with 0.02% X-Gal in 0.5 Msodium phosphate buffer, pH 7.0, 0.1% SDS, 6% dimethyl formamide (DMF),7 mM β-mercaptoethanol, 1% agarose, and incubated at 37° C., to monitorβ-galactosidase activity. Results were analyzed by scanning andquantification was performed using an appropriate software.

e) Results

Co-expression of different variants resulted in cleavage of the SH3target in 58 tested combinations. Functional combinations are summarizedin Table I herebelow. In this table, “+” indicates a functionalcombination on the SH3 target sequence, i.e., the heterodimer is capableof cleaving the SH3 target sequence.

TABLE I Amino acids positions and residues of the I-Crel variantscleaving the SH3.3 target 44A 4E 1V 54L 44A 44A 44A 44A 44A 66C 44A 54L54L 54L 54L 54L 70Q 54L 70Q 64A 70Q 64A 64A 71R 57E 44V 75N 70Q 75Y 70Q70Q 75N 44A 70Q 54L 105A 75N 92R 75N 75N 151A 54L 75N 70Q 158R 158R 158R158R 158W 158R 70Q 158R 75N 162A 162A 162A 162A 162A 162A 75N 162A 77VAmino acids 30G 38R + + + + + + + positions and 70D 75N resdidues of theI- 86D Crel variants cleaving 30G 38R + + + + + + the SH3.4 target 70D75N 81T 154G 30G 38R + + + + + + + + + 50R 70D 75N 142R

In conclusion, several heterodimeric I-CreI variants, capable ofcleaving the SH3 target sequence in yeast, were identified.

Example 1.2. Validation of SH3 Target Cleavage in an ExtrachromosomalModel in CHO Cells

I-CreI variants able to efficiently cleave the SH3 target in yeast whenforming heterodimers are described hereabove in example 1.1. In order toidentify heterodimers displaying maximal cleavage activity for the SH3target in CHO cells, the efficiency of some of these variants wascompared using an extrachromosomal assay in CHO cells. The screen in CHOcells is a single-strand annealing (SSA) based assay where cleavage ofthe target by the meganucleases induces homologous recombination andexpression of a LagoZ reporter gene (a derivative of the bacterial lacZgene).

a) Cloning of SH3 Target in a Vector for CHO Screen

An oligonucleotide corresponding to the SH3 target sequence flanked bygateway cloning sequences, was ordered from PROLIGO (SEQ ID NO: 12;TGGCATACAAGTTTCCAATACAAGGTACAAAGTCCTGACAATCGTCTGTCA). Double-strandedtarget DNA, generated by PCR amplification of the single strandedoligonucleotide, was cloned using the Gateway protocol (INVITROGEN) intothe pCLS1058 CHO reporter vector. Cloned target was verified bysequencing (MILLEGEN).

b) Re-Cloning of Meganucleases

The open-reading frames coding for these variants identified in Table Ihereabove sub-cloned into the pCLS2437 expression vector. ORFs wereamplified by PCR on yeast DNA using primers of SEQ ID Nos. 13 and 14(5′-AAAAAGCAGGCTGGCGCGCCTACACAGCGGCCTTGCCACCATG-3′ and5′-AGAAAGCTGGGTGCTAGCGCTCGAGTTATCAGTCGG-3′). PCR products were cloned inthe CHO expression vector pCLS2437 using the AscI and XhoI restrictionenzymes for internal fragment replacement. Selected clones resultingfrom ligation and E. coli transformation steps were verified bysequencing (MILLEGEN).

c) Extrachromosomal Assay in Mammalian Cells

CHO K1 cells were transfected with Polyfect® transfection reagentaccording to the supplier's protocol (QIAGEN). 72 hours aftertransfection, culture medium was removed and 150 μl of lysis/revelationbuffer for β-galactosidase liquid assay was added (typically 1 liter ofbuffer contained 100 ml of lysis buffer (Tris-HCl 10 mM pH7.5, NaCl 150mM, Triton X100 0.1%, BSA 0.1 mg/ml, protease inhibitors), 10 ml of Mg100× buffer (MgCl₂ 100 mM, β-mercaptoethanol 35%), 110 ml ONPG 8 mg/mland 780 ml of sodium phosphate 0.1M pH7.5). After incubation at 37° C.,OD was measured at 420 nm. The entire process was performed on anautomated Velocity11 BioCel platform.

Per assay, 150 ng of target vector was cotransfected with 12.5 ng ofeach one of both variants.

d) Results

The four following variants described in Table I were re-cloned intopCLS2437:

-   -   44A 54L 70Q 75Y 92R 158R 162A (referred to as SH3.3-MA);    -   1V 44A 54L 64A 70Q 75N 158W 162A (referred to as SH3.3-MB);    -   30G 38R 70D 75N 86D (referred to as SH3.4-M1); and    -   30G 38R 70D 75N 81T 154G (referred to as SH3.4-M2).

These I-CreI variants were assayed together as heterodimers against theSH3 target in the CHO extrachromosomal assay.

Table II shows the functional combinations obtained for nineheterodimers.

TABLE II Optimized variants cleaving SH3.3 44A 54L 70Q 75Y 1V 44A 54L64A 70Q 92R 158R 162A 75N 158W 162A Optimized 30G 38R 70D + + variants75N 86D cleaving 30G 38R 70D + + SH3.4 75N 81T 154G

Analysis of the efficiencies of cleavage and recombination of the SH3sequence demonstrates that all of the four tested combinations of I-CreIvariants were capable to transpose their cleavage activity from yeast toCHO cells without additional mutation.

Example 1.3. Covalent Assembly as Single Chain and Improvement ofMeganucleases Cleaving SH3

Co-expression of the variants identified in example 1.1. leads to a highcleavage activity of the SH3 target in yeast. Some of the heterodimershave been validated for SH3 cleavage in a mammalian expression system(example 1.2.). One of them, shown in Table III, was selected forfurther optimization.

TABLE III Amino acids positions and residues SH3 variant of the I-CreIvariants SH3.3-MA 44A 54L 70Q 75Y 92R 158R 162A SH3.4-M1 30G 38R 70D 75N86D

The MA×M1 SH3 heterodimer gives high cleavage activity in yeast.SH3.3-MA is a SH3.3 cutter that bears the following mutations incomparison with the I-CreI wild type sequence: 44A 54L 70Q 75Y 92R 158R162A. SH3.4-M1 is a SH3.4 cutter that bears the following mutations incomparison with the I-CreI wild type sequence: 30G 38R 70D 75N 86D.

Single chain constructs were engineered using the linker RM2 of SEQ IDNO: 15 (AAGGSDKYNQALSKYNQALSKYNQALSGGGGS), thus resulting in theproduction of the single chain molecule: MA-linkerRM2-M1. During thisdesign step, the G195 mutation was introduced in the C-terminal M1variant. In addition, mutations K7E, K96E were introduced into the MAvariant and mutations E8K, E61 R into the M1 variant to create thesingle chain molecule: MA (K7E K96E)-linkerRM2-M1 (E8K E61R G195) thatis further called SCOH-SH3-b1 scaffold. Some additional amino-acidsubstitutions have been found in previous studies to enhance theactivity of I-CreI derivatives: the replacement of Isoleucine 132 withValine (I132V) is one of them. The I132V mutation was introduced intoeither one, both or none of the coding sequence of N-terminal andC-terminal protein fragments.

The same strategy was applied to a second scaffold, termed SCOH-SH3-b56scaffold, based on the best variants cleaving SH3.3 (44A 54L 70Q 75Y 92R158R 162A) and SH3.4 (30G 38R, 50R 70D 75N 142R) as homodimers,respectively.

The resulting proteins are shown in Table IV below. All the single chainmolecules were assayed in CHO for cleavage of the SH3 target.

a) Cloning of the Single Chain Molecule

A series of synthetic gene assembly was ordered to MWG-EUROFINS.Synthetic genes coding for the different single chain variants targetingSH3 were cloned in pCLS1853 using AscI and XhoI restriction sites.

b) Extrachromosomal Assay in Mammalian Cells

CHO K1 cells were transfected as described in example 1.2. 72 hoursafter transfection, culture medium was removed and 150 μl oflysis/revelation buffer for 1-galactosidase liquid assay was added.After incubation at 37° C., OD was measured at 420 nm. The entireprocess is performed on an automated Velocity11 BioCel platform. Perassay, 150 ng of target vector was cotransfected with an increasingquantity of variant DNA from 3.12 to 25 ng (25 ng of single chain DNAcorresponding to 12.5 ng+12.5 ng of heterodimer DNA). Finally, thetransfected DNA variant DNA quantity was 3.12 ng, 6.25 ng, 12.5 ng and25 ng. The total amount of transfected DNA was completed to 175 ng(target DNA, variant DNA, carrier DNA) using an empty vector (pCLS0002).

d) Results

The activity of the single chain molecules against the SH3 target wasmonitored using the previously described CHO assay along with ourinternal control SCOH-RAG and I-Sce I meganucleases. All comparisonswere done at 3.12 ng, 6.25 ng, 12.5 ng, and 25 ng transfected variantDNA (FIGS. 2 and 3). All the single molecules displayed SH3 targetcleavage activity in CHO assay as listed in Table IV.

TABLE IV Cleavage Mutations on N- Mutations on C- SEQ ID of SH3 in Nameterminal monomer terminal monomer No. CHO cells SCOH-SH3-b56-A 7E 44A54L 70Q 75Y 8K 19S 30G 38R 25 + 92R 96E 158R 162A 50R 61R 70D 75N 142RSCOH-SH3-b56-B 7E 44A 54L 70Q 75Y 8K 19S 30G 38R 26 + 92R 96E 132V 158R50R 61R 70D 75N 162A 142R SCOH-SH3-b56-C 7E 44A 54L 70Q 75Y 8K 19S 30G38R 27 + 92R 96E 132V 158R 50R 61R 70D 75N 162A 132V 142R SCOH-SH3-b56-D7E 44A 54L 70Q 75Y 8K 19S 30G 38R 28 + 92R 96E 158R 162A 50R 61R 70D 75N132V 142R SCOH-SH3-b1-A 7E 44A 54L 70Q 75Y 8K 19S 30G 38R 29 + 92R 96E158R 162A 61R 70D 75N 86D SCOH-SH3-b1-B 7E 44A 54L 70Q 75Y 8K 19S 30G38R 30 + 92R 96E 132V 158R 61R 70D 75N 86D 162A SCOH-SH3-b1-C 7E 44A 54L70Q 75Y 8K 19S 30G 38R 31 + 92R 96E 132V 158R 61R 70D 75N 86D 162A 132VSCOH-SH3-b1-D 7E 44A 54L 70Q 75Y 8K 19S 30G 38R 32 + 92R 96E 158R 162A61R 70D 75N 86D 132V

Variants shared specific behaviour upon assayed dose depending on themutation profile they bear (FIGS. 2 and 3). For example, SCOH-SH3-b1-Chas a similar profile, and is even more active than. Its activityreaches the maxima at the lowest DNA quantity transfected from lowquantity to high quantity. In comparison with SCOH-SH3-b1-C, themolecule SCOH-SH3-b56-A has a maximal activity at higher DNA doses butreaches equivalent level of activity of SCOH-SH3-b1-C and our internalstandard.

All of the variants described are active and can be used for insertingtransgenes into the SH3 locus.

Example 2 Engineering Meganucleases Targeting the SH4 Locus

SH4 is a locus that is present on chromosome 7. The SH4 locus comprisesa 24 bp non-palindromic sequence of SEQ ID NO: 3. As shown in Table A,SH4 is located in the vicinity a RIS disclosed in Schwarzwaelder et al.(J. Clin. Invest. 2007 117:2241). The SH4 sequence is not included inany of the CIS described in Deichman et al.

Experiments similar to those described hereabove in Example 1 werecarried out to identify I-CreI heterodimers and single-chainmeganucleases capable of cleaving a target sequence of SEQ ID NO: 3.

Example 2.1. Identification of Meganucleases Cleaving SH4

I-CreI variants potentially cleaving the SH4 target sequence inheterodimeric form were constructed by genetic engineering. Pairs ofsuch variants were then co-expressed in yeast. Upon co-expression, oneobtains three molecular species, namely two homodimers and oneheterodimer. It was then determined whether the heterodimers werecapable of cutting the SH4 target sequence of SEQ ID NO: 3.

a) Construction of Variants of the I-CreI Meganuclease CleavingPalindromic Sequences Derived from the SH4 Target Sequence

The SH4 sequence is partially a combination of the 10AAA_P (SEQ ID NO:4), 5ACT_P (SEQ ID NO: 16), 10AAA_P (SEQ ID NO: 4), 5GGT_P (SEQ ID NO:17) targets shown on FIG. 4. These sequences are cleaved by previouslyidentified mega-nucleases, obtained as described in International PCTApplications WO 2006/097784 and WO 2006/097853; Arnould et al., J. Mol.Biol., 2006, 355, 443-458; Smith et al., Nucleic Acids Res., 2006. Thus,SH4 should be cleaved by combinatorial variants resulting from thesepreviously identified meganucleases.

The screening procedure was performed using methods derived from thosedescribed in Chames et al. (Nucleic Acids Res., 2005, 33, e178), Arnouldet al. (J. Mol. Biol., 2006, 355, 443-458), Smith et al. (Nucleic AcidsRes., 2006, 34, e149) and Arnould et al. (Arnould et al. J Mol Biol.2007 371:49-65) on the two following palindromic sequences: the SH4.3sequence of SEQ ID NO: 18 and the SH4.4 sequence of SEQ ID NO: 19.

b) Construction of Target Vector

The experimental procedure is as described in Example 1.1, with theexception that an oligonucleotide corresponding to the SH4 targetsequence of SEQ ID NO: 20(5′-TGGCATACAAGTTTTTAAAACACTGTACACCATTTTGACAATCGTCTGTCA-3′) was used.

c) Co-Expression of Variants

Yeast DNA from variants cleaving the SH4.3 and SH4.4 target in thepCLS542 and pCLS1107 expression vectors was extracted using standardprotocols and was used to transform E. coli. The resulting plasmid DNAwas then used to co-transform yeast strain. Transformants were selectedon synthetic medium lacking leucine and containing G418.

d) Mating of Meganucleases Coexpressing Clones and Screening in Yeast

Mating was performed using a colony gridder (QpixII, Genetix). Variantswere gridded on nylon filters covering YPD plates, using a low griddingdensity (4-6 spots/cm²). A second gridding process was performed on thesame filters to spot a second layer consisting of differentreporter-harboring yeast strains for each target. Membranes were placedon solid agar YPD rich medium, and incubated at 30° C. for one night, toallow mating. Next, filters were transferred to synthetic medium,lacking leucine and tryptophan, adding G418, with galactose (2%) as acarbon source, and incubated for five days at 37° C., to select fordiploids carrying the expression and target vectors. After 5 days,filters were placed on solid agarose medium with 0.02% X-Gal in 0.5 Msodium phosphate buffer, pH 7.0, 0.1% SDS, 6% dimethyl formamide (DMF),7 mM β-mercaptoethanol, 1% agarose, and incubated at 37° C., to monitorβ-galactosidase activity. Results were analyzed by scanning andquantification was performed using appropriate software.

e) Results

Co-expression of variants cleaving the SH4.3 target and of variantscleaving the SH4.4 target resulted in cleavage of the SH4 target in 6cases. Functional combinations are summarized in Table V.

TABLE V Amino acids positions and residues of the I-CreI variantscleaving the SH4.3 target 24V 44R 68Y 70S 24V 68A 70S 75N 24V 70D 75N75Y 77N 77R 77R Amino acids positions and 24V 44Y 70S + + + resdidues24V 44Y 70S + + + of I-CreI variants cleaving 77V the SH4.4 target

Example 2.2. Validation of SH4 Target Cleavage in an ExtrachromosomalModel in CHO Cells

In order to identify heterodimers displaying maximal cleavage activityfor the SH4 target in CHO cells, the efficiency of several combinationsof variants to cut the SH4 target was assessed using an extrachromosomalassay in CHO cells. The screen in CHO cells is a single-strand annealing(SSA) based assay where cleavage of the target by the meganucleasesinduces homologous recombination and expression of a LagoZ reporter gene(a derivative of the bacterial lacZ gene).

a) Cloning of SH4 Target in a Vector for CHO Screen

The target was cloned as follows. An oligonucleotide of SEQ ID NO: 21,corresponding to the SH4 target sequence flanked by gateway cloningsequence, was ordered from PROLIGO(5′-TGGCATACAAGTTTTTAAAACACTGTACACCATTTTGACAATCGTCTGTCA-3′).Double-stranded target DNA, generated by PCR amplification of the singlestranded oligonucleotide, was cloned using the Gateway protocol(INVITROGEN) into CHO reporter vector (pCLS1058). The cloned fragmentwas verified by sequencing (MILLEGEN).

b) Re-Cloning of Meganucleases

The ORFs of I-CreI variants cleaving the SH4.5 and SH4.6 targetsobtained hereabove were sub-cloned in pCLS2437. ORFs were amplified byPCR on yeast DNA using primers of SEQ ID NO: 22 and 23(5′-AAAAAGCAGGCTGGCGCGCCTACACAGCGGCCTTGCCACCATG-3′ and5′-AGAAAGCTGGGTGCTAGCGCTCGAGTTATCAGTCGG-3′) primers. PCR products werecloned in the CHO expression vector pCLS2437 using the AscI and NheIrestrictions sites for internal fragment replacement. Selected clonesresulting from ligation and E. coli transformation steps were verifiedby sequencing (MILLEGEN).

c) Extrachromosomal Assay in Mammalian Cells

CHO K1 cells were transfected with Polyfect® transfection reagentaccording to the supplier's protocol (QIAGEN). 72 hours aftertransfection, culture medium was removed and 150 μl of lysis/revelationbuffer for β-galactosidase liquid assay was added (typically 1 liter ofbuffer contained: 100 ml of lysis buffer (Tris-HCl 10 mM pH7.5, NaCl 150mM, Triton X100 0.1%, BSA 0.1 mg/ml, protease inhibitors), 10 ml of Mg100× buffer (MgCl₂ 100 mM, β-mercaptoethanol 35%), 110 ml ONPG 8 mg/mland 780 ml of sodium phosphate 0.1M pH7.5). After incubation at 37° C.,OD was measured at 420 nm. The entire process is performed on anautomated Velocity11 BioCel platform. Per assay, 150 ng of target vectorwas cotransfected with 12.5 ng of each one of both variants (12.5 ng ofvariant cleaving palindromic SH4.3 target and 12.5 ng of variantcleaving palindromic SH4.4 target).

d Results

The four variants shown in Table VI and described herebaove in Example2.1, were selected for further analysis.

TABLE VI Amino acids positions and residues of the I-CreI variantsSH4.3-MA 24V 44R 68Y 70S 75Y 77N SH4.3-MC 24V 68A 70S 75N 77R SH4.4-M124V 44Y 70S SH4.4-M2 24V 44Y 70S 77V

These variants were cloned in pCLS2437. Then, I-CreI variants cleavingthe SH4.3 or SH4.4 targets were assayed together as heterodimers againstthe SH4 target in the CHO extrachromosomal assay. Analysis of theefficiencies of cleavage and recombination of the SH4 sequencedemonstrates that all tested combinations of I-CreI variants were ableto transpose their cleavage activity from yeast to CHO cells withoutadditional mutation (Table VII).

TABLE VII Amino acids positions and residues of the I-CreI variants:variants cleaving SH4.3 SH4.3-MA: SH4.3-MC: 24V 44R 68Y 24V 68A 70S 75Y77N 70S 75N 77R Amino acids SH4.4-M1: + + positions and 24V 44Y 70Sresidues of the SH4.4-M2: + + I-CreI variants: 24V 44Y 70S 77V variantscleaving SH4.4

Example 2.3. Covalent Assembly as Single Chain and Improvement ofMeganucleases Cleaving SH4 by Site-Directed Mutagenesis

Co-expression of the variants described in Example 2.1. leads to a highcleavage activity of the SH4 target in yeast. In addition, some of themhave been validated for SH4 cleavage in a mammalian expression system(Example 2.2.).

The MA×M2 SH4 heterodimer gives high cleavage activity in yeast.SH4.3-MA is a SH4.3 cutter that bears the following mutations incomparison with the I-CreI wild type sequence: 24V 44R 68Y 70S 75Y 77N.SH4.4-M2 is a SH4.4 cutter that bears the following mutations incomparison with the I-CreI wild type sequence: 24V 44Y 70S 77V.

As described in example 1.3, single chain constructs were engineeredusing the linker RM2, thereby resulting in the production of a singlechain molecule referred to as MA-LinkerRM2-M2. During this design step,the G19S mutation was introduced in the C-terminal M2 mutant. Inaddition, K7E and K96E mutations were introduced into the MA mutant, andE8K and E61R mutations into the M2 mutant in order to create a singlechain molecule referred to as MA (K7E K96E)-linkerRM2-M2 (E8K E61R G19S)that is called further SCOH-SH4-b1 scaffold.

The Isoleucine 132 to Valine (I132V) mutation was introduced into thecoding sequence of either, one, none or both N-terminal and C-terminalprotein fragment.

The same strategy was applied to a second scaffold based on the goodcutters on SH4.3 (44R 68Y 70S 75Y 77N) and SH4.4 (24V 44Y 70S 77V). Thisscaffold is further referred to as SCOH-SH4-b56 scaffold.

The design of the derived single chain constructs is shown in TableVIII. The single chain constructs were tested in CHO for their abilityto induce cleavage of the SH4 target.

a) Cloning of the Single Chain Molecule

A series of synthetic gene assembly was performed to MWG-EUROFINS.Synthetic genes, coding for the different single chain variantstargeting SH4, were cloned in pCLS1853 using AscI and XhoI restrictionsites.

b) Extrachromosomal Assay in Mammalian Cells

CHO K1 cells were transfected as described hereabove. 72 hours aftertransfection, culture medium was removed and 150 μl of lysis/revelationbuffer for β-galactosidase liquid assay was added. After incubation at37° C., OD was measured at 420 nm. The entire process is performed on anautomated Velocity11 BioCel platform. Per assay, 150 ng of target vectorwas cotransfected with an increasing quantity of variant DNA from 3.12to 25 ng (25 ng of single chain DNA corresponding to 12.5 ng+12.5 ng ofheterodimer DNA). Finally, the transfected DNA variant DNA quantity was3.12 ng, 6.25 ng, 12.5 ng and 25 ng. The total amount of transfected DNAwas completed to 175 ng (target DNA, variant DNA, carrier DNA) using anempty vector (pCLS0002).

c) Results

The single chain molecules described in Table VIII were monitored fortheir activity against the SH4 target using the previously described CHOassay by comparison to our internal control SCOH-RAG and I-Sce Imeganucleases. All activity evaluation was done upon DNA transfecteddose of 3.12 ng, 6.25 ng, 12.5 ng, and 25 ng. All single chain moleculeswere displaying activity on SH4 target as reported in Table VIII.

TABLE VIII Activity on SH4 target Mutations on N-terminal Mutations onC- SEQ ID in CHO Name monomer terminal monomer No. Assay SCOH- 7E 44R68Y 70S 75Y 8K 19S 24V 44Y 61R 33 + SH4-b56-A 77N 96E 70S 77V SCOH- 7E44R 68Y 70S 75Y 8K 19S 24V 44Y 61R 34 + SH4-b56-B 77N 96E 132V 70S 77VSCOH- 7E 44R 68Y 70S 75Y 8K 19S 24V 44Y 61R 35 + SH4-b56-C 77N 96E 132V70S 77V 132V SCOH- 7E 44R 68Y 70S 75Y 8K 19S 24V 44Y 61R 36 + SH4-b56-D77N 96E 70S 77V 132V SCOH- 7E 24V 44R 68Y 70S 8K 19S 24V 44Y 61R 37 +SH4-b1-A 75Y 77N 96E 70S 77V SCOH- 7E 24V 44R 68Y 70S 8K 19S 24V 44Y 61R38 + SH4-b1-B 75Y 77N 96E 132V 70S 77V SCOH- 7E 24V 44R 68Y 70S 8K 19S24V 44Y 61R 39 + SH4-b1-C 75Y 77N 96E 132V 70S 77V 132V SCOH- 7E 24V 44R68Y 70S 8K 19S 24V 44Y 61R 40 + SH4-b1-D 75Y 77N 96E 70S 77V 132V

Variants shared specific behaviour upon assayed dose depending on themutation profile they bear (FIGS. 5 and 6). For example, SCOH-SH4-b1Cshows an activity level within the same range as the internal standardSCOH-RAG (: its activity increases from low quantity to high quantity.At the assayed DNA trasfected doses, its activity is superior to that ofSCOH-SH4-B56A.

All of these variants are active at different levels of intensity andcan thus be used for SH4 genome targeting.

Example 3 Detection of Cleavage Activity at the SH Loci in Human CellLine

I-CreI variants able to efficiently cleave the SH3 and SH4 targets inyeast and in mammalian cells (CHO K1 cells) have been identified inExamples 1 and 2. The efficiency of the SH3 and SH4 meganucleases tocleave their endogenous DNA target sequences was next tested. Thisexample will demonstrate that meganucleases engineered to cleave the SH3and SH4 target sequences cleave their cognate endogenous sites in humancells.

Repair of double-strand break by non homologous end-joining (NHEJ) cangenerate small deletions and insertions (InDel) (FIG. 7). In nature,this error-prone mechanism can be deleterious for the cells survival butprovides a rapid indicator of meganucleases activity at endogenous loci.

Example 3.1: Detection of Induced Mutagenesis at the Endogenous Site

The assays based on cleavage-induced recombination in mammal or yeastcells, which are used for screening variants with altered specificity,are described in International PCT Application WO 2004/067736; Epinat etal., Nucleic Acids Res., 2003, 31:2952-2962; Chames et al., NucleicAcids Res., 2005, 33:e178, and Arnould et al., J. Mol. Biol., 2006,355:443-458. These assays result in a functional LacZ reporter genewhich can be monitored by standard methods.

Single Chain I-CreI variants for SH3 and SH4 cloned in the pCLS1853plasmid were used for this experiment. The day previous experiment,cells from the human embryonic kidney cell line, 293-H (Invitrogen) wereseeded in a 10 cm dish at density of 1.2 10⁶ cells/dish. The followingday, cells were transfected with 3 μg of an empty plasmid or ameganuclease-expressing plasmid using lipofectamine (Invitrogen). 72hours after transfection, cells were collected and diluted (dilution1/20) in fresh culture medium. After 7 days of culture, cells werecollected and genomic DNA extracted.

200 ng of genomic DNA were used to amplify the endogenous locussurrounding the meganuclease cleavage site by PCR amplification. A 377bp fragment corresponding to the SH3 locus was amplified using specificPCR primers A (SEQ ID NO 44; 5′-tgggggtcttactctgtttccc-3′) and B (SEQ IDNO 45; 5′-aggagagtccttctttggcc-3′). A 396 bp fragment corresponding tothe SH4 locus was amplified using PCR primers C (SEQ ID NO 46;5′-gagtgatagcataatgaaaacc-3′) and D (SEQ ID NO 47;5′-ctcaccataagtcaactgtctc-3′). PCR amplification was performed to obtaina fragment flanked by specific adaptator sequences (SEQ ID NO 48;5′-CCATCTCATCCCTGCGTGTCTCCGACTCAG-3′ and SEQ ID NO: 495′-CCTATCCCCTGTGTGCCTTGGCAGTCTCAG-3′) provided by the company offeringsequencing service (GATC Biotech AG, Germany) on the 454 sequencingsystem (454 Life Sciences). An average of 18,000 sequences was obtainedfrom pools of 2 amplicons (500 ng each). After sequencing, differentsamples were identified based on barcode sequences introduced in thefirst of the above adaptators. Sequences were then analyzed for thepresence of insertions or deletions in the cleavage site of SH3 or SH4respectively.

Example 3.2: Results

TABLE IX Total InDel % of Vector sequence containing InDel expressing:number sequences events SH 3 meganuclease 12841 56 0.44 Empty 2153 10.05 SH 4 meganuclease 8259 18 0.22 Empty 12811 3 0.02

The analysis of the genomic DNA extracted from cells transfected withthe meganuclease targeting the SH3 locus showed that 56 out of the 12841analyzed sequences (0.44%) contained InDel events within the recognitionsite of SH3. Similarly, after transfection with the meganucleasetargeting the SH4 locus, 18 out of the 8259 analyzed sequences (0.22%)contained InDel events within the recognition site of SH4.

Since small deletions or insertions could be related to PCR orsequencing artefacts, the same loci were analyzed after transfectionwith a plasmid that does not express the meganuclease. The analysis ofthe SH3 and SH4 loci revealed that virtually no InDel events could bedetected. Indeed, only 0.05% (1/2153) and 0.02% (3/12811) of theanalyzed sequences contained mutations.

Moreover, the analysis of the size of the DNA insertion or deletionsequences (FIG. 8) revealed a similar type of events with a predominanceof small insertions (<5 bp) and of small deletions (<10 bp).

These data demonstrate that the meganucleases engineered to targetrespectively the SH3 or SH4 loci are active in human cells and cancleave their cognate endogenous sequence. Moreover, it shows thatmeganucleases have the ability to generate small InDel events within asequence which would disrupt a gene ORF and thus inactivate thecorresponding gene expression product.

Example 4 Gene Targeting at the Endogenous SH3 and SH4 Loci in HumanCells

To validate the cleavage activity of engineered single-chain SH3 and SH4meganucleases, their ability to stimulate homologous recombination atthe endogenous human SH3 and SH4 loci was next evaluated. Cells weretransfected with mammalian expression plasmids for single chainmolecules SCOH-SH3-b1-C or SCOH-SH4-b1-C and a vector comprising atargeting construct. The vector comprising a targeting construct (alsoreferred to as “donor repair plasmid”) was the pCLS3777 or pCLS3778plasmid containing a 2.8 kb sequence consisting of an exogenous DNAsequence, flanked by two sequences homologous to the human SH3 or SH4loci. The sequences homologous to the human SH3 or SH4 loci had a lengthof 1.5 kb. Cleavage of the native SH3 or SH4 loci by the meganucleaseyields a substrate for homologous recombination, which may use the donorrepair plasmid as a repair matrix. Thus, the frequency with whichtargeted integration occurs at the SH3 or SH4 loci is indicative of thecleavage efficiency of the genomic SH3 or SH4 target site.

Example 4.1: Material and Methods

a) Meganuclease Expression Plasmids

The meganucleases used in this example are SCOH-SH3-b1-C andSCOH-SH4-b1-C cloned in a mammalian expression vector, resulting inplasmid pCLS2697 and pCLS2705, respectively.

b) Donor Repair Plasmids

For SH3 gene targeting experiments, the donor plasmid contained:

-   -   as the left homology arm: a PCR-generated fragment of the SH3        locus (position 6850510 to 6852051 on chromosome 6,        NC_(—)000006.11). This fragment has a length of 1540 bp;    -   as the right homology arm: a fragment of the SH3 locus (position        6852107 to 6853677 on chromosome 6, NC_(—)000006.11). This        fragment has a length of 1571 bp.        For SH4 gene targeting experiments, the donor plasmid contained:    -   as the left homology arm: a PCR-generated fragment of the SH4        locus (position 114972751 to 114974269 on chromosome 7,        NC_(—)000007.13). This fragment has a length of 1519 bp; and    -   as the right homology arm: a fragment of the SH4 locus (position        114974316 to 114976380 on chromosome 7, NC_(—)000007.13). This        fragment has a length of 2065 bp.

For both SH3 and SH4, the left and right homology arms were insertedupstream (using an AscI site) and downstream (using a SbfI site),respectively, of an exogenous 2.8 kb DNA fragment containing two CMVpromoters and a neomycin resistance gene. The resulting plasmids arereferred to as pCLS3777 (for SH3) and pCLS3778 (for SH4).

c) Sh3 and Sh4 Gene Targeting Experiments

Human embryonic kidney 293H cells (Invitrogen) were plated at a densityof 1×10⁶ cells per 10 cm dish in complete medium (DMEM supplemented with2 mM L-glutamine, penicillin (100 UI/ml), streptomycin (100 μg/ml),amphotericin B (Fongizone) (0.25 μg/ml) (Invitrogen-Life Science) and10% FBS). The next day, cells were transfected with Lipofectamine 2000transfection reagent (Invitrogen) according to the supplier's protocol.Briefly, 2 μg of the donor plasmid was co-transfected with 3 μg ofsingle-chain meganuclease expression vectors. After 72 hours ofincubation at 37° C., cells were trypsinized and plated in completemedium at 10 or 100 cells per well in 96-well plates.

Once cells were 80 to 100% confluent, genomic DNA extraction wasperformed with the ZR-96 genomic DNA kit (Zymo research) according tothe supplier's protocol.

d) PCR Analysis of Gene Targeting Events

The gene targeting frequency was determined by PCR on genomic DNA usingthe following primers: 5′-CTGTGTGCTATGATCTTGCC-3′ (SH3 GHGF4; SEQ ID NO:50) and 5′-CCTGTCTCTTGATCAGATCC-3′ (NeoR2; SEQ ID NO: 51) for SH3, and5′-GTGGCCTCTCAGTCTGTTTA-3′ (SH4 GHGF2; SEQ ID NO: 52) and5′-AGTCATAGCCGAATAGCCTC-3′ (NeoR5; SEQ ID NO: 53) for SH4. The PCRsresult in a 2500 bp (SH3) or a 2268 bp (SH4) gene targeting specific PCRproduct. The SH3 GHGF4 and SH4 GHGF2 primers are forward primers locatedupstream of the left homology arms of the donor repair plasmids. TheNeoR primers are reverse primers located in the exogenous DNA insertedbetween the two homology arms of the donor repair plasmid.

Example 4.2: Results

Human embryonic kidney 293H cells were co-transfected with a plasmidexpressing one of the two single-chain SH3 or SH4 meganucleases and thedonor repair plasmid pCLS3777 or pCLS3778. As a control for spontaneousrecombination, 293H cells were also transfected with the donor repairplasmid alone. The cells were then plated at 10 or 100 cells per well in96-well microplates. Genomic DNA derived from these cells was analyzedfor gene targeting by PCR as described in Material and Methods.

In the absence of meganuclease (repair plasmid alone), no PCR positivesignal was detected among the 22560 and 18800 cells (for SH3 and SH4,respectively) that were analyzed in pools of 10 or 100 cells.

In contrast to this, in the presence of the SH3 meganuclease, 12positive clones were detected among the 18800 cells analyzed in pools of100 cells, thereby indicating a frequency of recombination of 0.064%. Inthe presence of the SH4 meganuclease, 11 positives were detected amongthe 3760 cells analyzed in pools of 10 cells indicating a frequency ofrecombination of 0.29%. The results are presented in Table X below. Therecombination frequencies indicated here are underestimated because notall plated cells start dividing again. Estimate survival upon platingcan thus be estimated to be about 33%. Therefore, frequencies ofrecombination are probably underestimated by a 3-fold factor.

TABLE X Gene targeting Meganuclease Cells per well PCR+ events frequencySH3 100 12/18800 0.064%  SH4 10 11/3760  0.29% SH4 100 15/18800 0.08%None (with SH3 100  0/18800 NA repair plasmid) None (with SH4 100 0/18800 NA repair plasmid) NA: not applicable

These results demonstrate that the two single chain moleculesSCOH-SH3-b1-C and SCOH-SH4-b1-C are capable of inducing high levels ofgene targeting at the endogenous SH3 and SH4 locus, respectively.

Example 5 Engineering Meganucleases Targeting the SH6 Locus

SH6 is a locus comprising a 24 bp non-palindromic target(TTAATACCCCGTACCTAATATTGC, SEQ ID NO: 59) that is present on chromosome21. SH6 is located in the vicinity of a RIS disclosed in Schwarzwaelderet al. (J Clin Invest 2007:2241-9). The SH6 sequence is not included inany of the CIS described in Deichman et al.

Example 5.1. Identification of Meganucleases Cleaving SH6

I-CreI variants potentially cleaving the SH6 target sequence inheterodimeric form were constructed by genetic engineering. Pairs ofsuch variants were then co-expressed in yeast. Upon co-expression, oneobtains three molecular species, namely two homodimers and oneheterodimer. It was then determined whether the heterodimers werecapable of cutting the SH6 target sequence of SEQ ID NO: 59.

a) Construction of Variants of the I-CreI Meganuclease CleavingPalindromic Sequences Derived from the SH6 Target Sequence

The SH6 sequence is partially a combination of the 10AAT_P (SEQ ID NO:60), 5CCC_P (SEQ ID NO: 61), 10AAT_P (SEQ ID NO: 60), 5TAG_P (SEQ ID NO:62) target sequences which are shown on FIG. 9. These sequences arecleaved by mega-nucleases obtained as described in International PCTapplications WO 2006/097784 and WO 2006/097853, Arnould et al. (J. Mol.Biol., 2006, 355, 443-458) and Smith et al. (Nucleic Acids Res., 2006).Thus, SH6 should be cleaved by combinatorial variants resulting fromthese previously identified meganucleases.

Two palindromic targets, SH6.3 and SH6.4, were derived from SH6 (FIG.9). Since SH6.3 and SH6.4 are palindromic, they should be cleaved byhomodimeric proteins. Therefore, homodimeric I-CreI variants cleavingeither the SH6.3 palindromic target sequence of SEQ ID NO: 63 or theSH6.4 palindromic target sequence of SEQ ID NO: 64 were constructedusing methods derived from those described in Chames et al. (NucleicAcids Res., 2005, 33, e178), Arnould et al. (J. Mol. Biol., 2006, 355,443-458), Smith et al. (Nucleic Acids Res., 2006, 34, e149) and Arnouldet al. (Arnould et al. J Mol Biol. 2007 371:49-65).

b) Construction of Target Vector

The experimental procedure is as described in Example 1.1., with theexception that an oligonucleotide corresponding to the SH6 targetsequence (5′-TGGCATACAAGTTTTTAATACCCCGTACCTAATATTGCCAATCGTCTGTCA-3′ (SEQID NO: 65) was used.

c) Co-Expression of Variants

Yeast DNA was extracted from variants cleaving the SH6.3 and SH6.4targets in the pCLS542 and pCLS1107 expression vectors using standardprotocols and was used to transform E. coli. Transformants were selectedon synthetic medium lacking leucine and containing G418.

d) Mating of Meganucleases Coexpressing Clones and Screening in Yeast

Mating was performed using a colony gridder (QpixII, Genetix). Variantswere gridded on nylon filters covering YPD plates, using a low griddingdensity (4-6 spots/cm²). A second gridding process was performed on thesame filters to spot a second layer consisting of differentreporter-harboring yeast strains for each target. Membranes were placedon solid agar YPD rich medium, and incubated at 30° C. for one night, toallow mating. Next, filters were transferred to synthetic medium,lacking leucine and tryptophan, adding G418, with galactose (2%) as acarbon source, and incubated for five days at 37° C., to select fordiploids carrying the expression and target vectors. After 5 days,filters were placed on solid agarose medium with 0.02% X-Gal in 0.5 Msodium phosphate buffer, pH 7.0, 0.1% SDS, 6% dimethyl formamide (DMF),7 mM β-mercaptoethanol, 1% agarose, and incubated at 37° C., to monitorβ-galactosidase activity. Results were analyzed by scanning andquantification was performed using appropriate software.

e) Results

Co-expression of ten variants cleaving the SH6.4 target and of twovariants cleaving the SH6.3 target resulted in cleavage of the SH6.1target in all but two cases. These two cases corresponded in whichdouble transformants were not obtained. Functional combinations aresummarized in Table XI.

TABLE XI Amino acids positions and residues of the I-CreI variantscleaving the SH6.3 target 44K 68T 70G 75N 44K 70S 75N Amino acids 28Q40R 44A 70L 75N 96R 111H + + positions and 144S residues 7R 28Q 40R 44A70L 75N 85R + + of the I-CreI 103T variants cleaving 28Q 40R 44A 70L 75N103S + + the SH6.4 target 24F 27V 28Q 40R 44A 70L 75N + + 99R 7R 28Q 40R44A 70L 75N 81T + + 7R 28Q 40R 44A 70L 75N 77V Not tested + 7R 28Q 40R44A 70L 75N 103T + + 121E 132V 160R 28Q 40R 44A 70L 75N Not tested + 7R28Q 40R 44A 70L 75N 103T + + 28Q 34R 40R 44A 70L 75N 81V + + 103T 108V160E + indicates a functional combination

Example 5.2. Validation of SH6 Target Cleavage in an ExtrachromosomalModel in CHO Cells

I-CreI variants able to efficiently cleave the SH6 target in yeast whenforming heterodimers are described hereabove in example 5.1. In order toidentify heterodimers displaying maximal cleavage activity for the SH3target in CHO cells, the efficiency of some of these variants wascompared using an extrachromosomal assay in CHO cells. The screen in CHOcells is a single-strand annealing (SSA) based assay where cleavage ofthe target by the meganucleases induces homologous recombination andexpression of a LagoZ reporter gene (a derivative of the bacterial lacZgene).

a) Cloning of SH6 Target in a Vector for CHO Screen

The target was cloned as follows: oligonucleotide corresponding to theSH6 target sequence flanked by gateway cloning sequence was ordered fromPROLIGO 5′-TGGCATACAAGTTTTTAATACCCCGTACCTAATATTGCCAATCGTCTGTCA-3′ (SEQID NO: 65). Double-stranded target DNA, generated by PCR amplificationof the single stranded oligonucleotide, was cloned using the Gatewayprotocol (INVITROGEN) into CHO reporter vector (pCLS1058). Cloned targetwas verified by sequencing (MILLEGEN).

b) Re-Cloning of Meganucleases

The ORF of I-CreI variants cleaving the SH6.3 and SH6.4 targetsidentified in example 5.1 were sub-cloned in pCLS2437. ORFs wereamplified by PCR on yeast DNA using the following primers:5′-AAAAAGCAGGCTGGCGCGCCTACACAGCGGCCTTGCCACCATG-3′ (SEQ ID NO: 66) and5′-AGAAAGCTGGGTGCTAGCGCTCGAGTTATCAGTCGG-3′ (SEQ ID NO: 67) primers. PCRproducts were cloned in the CHO expression vector pCLS2437 using theAscI and XhoI for internal fragment replacement. Selected clonesresulting from ligation and E. coli transformation steps were verifiedby sequencing (MILLEGEN).

c) Extrachromosomal Assay in Mammalian Cells

CHO K1 cells were transfected with Polyfect® transfection reagentaccording to the supplier's protocol (QIAGEN). 72 hours aftertransfection, culture medium was removed and 150 μl of lysis/revelationbuffer for β-galactosidase liquid assay was added (typically 1 liter ofbuffer contained: 100 ml of lysis buffer (Tris-HCl 10 mM pH7.5, NaCl 150mM, Triton X100 0.1%, BSA 0.1 mg/ml, protease inhibitors), 10 ml of Mg100× buffer (MgCl₂ 100 mM, β-mercaptoethanol 35%), 110 ml ONPG 8 mg/mland 780 ml of sodium phosphate 0.1M pH7.5). After incubation at 37° C.,OD was measured at 420 nm. The entire process is performed on anautomated Velocity11 BioCel platform. Per assay, 150 ng of target vectorwas cotransfected with 12.5 ng of each one of both variants (12.5 ng ofvariant cleaving palindromic SH6.3 target and 12.5 ng of variantcleaving palindromic SH6.4 target).

d) Results

One couple of variants forming an heterodimeric endonuclease able tocleave SH6 in yeast was chosen for confirmation in CHO usingextrachromosomal assay in a transient transfection.

The monomer capable of cleaving SH6.3 comprised the following mutations:44K 70S 75N (referred to as SH6-3-M1-44K 70S 75N) and the monomercapable of cleaving SH6.4 comprised the following mutations: 28Q 40R 44A70L 75N 96R 111H 144S (referred to as SH6-4-MB-28Q 40R 44A 70L 75N 96R111 H 144S).

Analysis of the efficiencies of cleavage and recombination of the SH6sequence demonstrates that the tested combination of I-CreI variants wasable to transpose its cleavage activity from yeast to CHO cells withoutadditional mutation.

Example 5.3. Covalent Assembly as Single Chain and Improvement ofMeganucleases Cleaving SH6

Co-expression of the cutter described in example 5.1 leads to a highcleavage activity of the SH6 target in yeast. One of them have beenvalidated for SH6 cleavage in a mammalian expression system (example5.2).

The M1×MA SH6 heterodimer gives high cleavage activity in yeast. M1 is aSH6.3 cutter that bears the following mutations in comparison with theI-CreI wild type sequence: 44K 70S 75N. MA is a SH6.4 cutter that bearsthe following mutations in comparison with the I-CreI wild typesequence: 7R 28Q 40R 44A 70L 75N 103T 121E 132V 160R.

Single chain constructs were engineered using the linker RM2(AAGGSDKYNQALSKYNQALSKYNQALSGGGGS; SEQ ID NO: 15) resulting in theproduction of the single chain molecule: MA-RM2-M1. During this designstep, the G19S mutation was introduced in the C-terminal M1 mutant. Inaddition, mutations K96E was introduced into the MA mutant and mutationsE8K, E61 R into the M1 mutant to create the single chain molecule:MA(K96E)-RM2-MA(E8K E61R) that is called further SCOH-SH6 b1 scaffold.

Four additional amino-acid substitutions have been found in previousstudies to enhance the activity of I-CreI derivatives: these mutationscorrespond to the replacement of Phenylalanine 54 with Leucine (F54L),Glutamic acid 80 with Lysine (E80K), Valine 105 with Alanine (V105A) andIsoleucine 132 with Valine (I132V). Some combinations were introducedinto the coding sequence of N-terminal and C-terminal protein fragment,and the first batch of resulting proteins were assayed for their abilityto induce cleavage of the SH6 target.

a) Introduction of Additional Mutations into the SC-OH Single ChainConstruct

Additional mutations were introduced by use of the QuikChange MultiSite-Directed Mutagenesis Kit from Stratagene/Agilent technologies Incaccording to the manufacturer's instructions. A first set ofoligonucleotides was used to introduce the mutations in the part of thesingle chain molecule corresponding to the first monomer. A second setof oligonucleotides was designed to introduce the same mutationsspecifically in the second part of the single chain moleculecorresponding to the second monomer as shown in (see Table XII).

TABLE XII SEQ ID NO: Name SequenceOligonucleotides used for mutagenesis of the first monomer 68 F54LForACCCAGCGCCGTTGGCTGCTGGACAAACTAGTG 69 F54LRevCACTAGTTTGTCCAGCAGCCAACGGCGCTGGGT 70 103T_105AForAAACAGGCAACCCTGGCTCTGAAAATTATCGAA 71 103T_105ARevTTCGATAATTTTCAGAGCCAGGGTTGCCTGTTTOligonucleotides used for mutagenesis of the second monomer 72F54Lmono2_For CACAAAGAAGGTGGTTGTTGGACAAATTGGTT 73 F54Lmono2_RevAACCAATTTGTCCAACAACCACCTTCTTTGTG 74 E80Kmono2_ForTGTCTAAAATTAAGCCTCTTCATAACTTTCTC 75 E80Kmono2_RevGAGAAAGTTATGAAGAGGCTTAATTTTAGACA

Isolated clones obtained at the term of this process were sequenced toconfirm the specific mutation profiles obtained. Profiles of interestwere then tested in CHO SSA assay in comparison with the initialconstruct as described.

b) Extrachromosomal Assay in Mammalian Cells

CHO K1 cells were transfected as described above. 72 hours aftertransfection, culture medium was removed and 150 μl of lysis/revelationbuffer for β-galactosidase liquid assay was added. After incubation at37° C., OD was measured at 420 nm. The entire process is performed on anautomated Velocity11 BioCel platform.

Per assay, 150 ng of target vector was cotransfected with an increasingquantity of variant DNA from 3.12 ng to 25 ng (25 ng of single chain DNAcorresponding to 12.5 ng+12.5 ng of heterodimer DNA). Finally, thetransfected DNA variant DNA quantity was 3.12 ng, 6.25 ng, 12.5 ng and25 ng. The total amount of transfected DNA was completed to 175 ng(target DNA, variant DNA, carrier DNA) using empty vector (pCLS0001).

c) Results

The activity of the SCOH-SH6-b1-C (pCLS2796) andSCOH-SH6-b1-B-(pCLS2928) single chain molecules (see Table XIII) againstthe SH6 target was monitored using the previously described CHO assay bycomparison to the SH6.3-M1×SH6.4-MB forming heterodimer and our internalcontrol SCOH-RAG and I-Sce I meganucleases. All comparisons were done at3.12 ng, 6.25 ng, 12.5 ng, and 25 ng transfected variant DNA (FIG. 10).The two single chain meganucleases were able to cleave more efficientlythe SH6 target than the starting heterodimer. The activity of the bestmolecule, SCOH-SH6-b1-C, was further improved by introduction additionalmutations among those described above in a new bath of meganucleases.

TABLE XIII Mutations SH6 on SEQ cleavage Mutations on N-terminalC-terminal ID Activity in Name segment segment NO: CHO SCOH- 7R 28Q 40R44A 70L 75N 8K 19S 44K 76 + SH6-b1-B 96E 103T 121E 132V 160R 61R 70S 75NSCOH- 7R 28Q 40R 44A 70L 75N 8K 19S 44K 77 + SH6-b1-C 96E 103T 121E 132V160R 61R 70S 75N 132V

Additional mutations were further introduced into the single chainscaffold according material and method. The molecules obtained andtested are listed in Table XIV.

TABLE XIV SH6 cleavage SEQ Activity Mutations on N- Mutations on C- IDin Name terminal segment terminal segment NO: CHO SCOH- 7R 28Q 40R 44A70L 8K 19S 44K 61R 78 + SH6-b1-C 75N 96E 103T 121E 70S 75N 132V 132V160R QCSH61- 7R 28Q 40R 44A 70L 8K 19S 44K 61R 79 + A01 75N 96E 103T105A 70S 75N 132V 121E 132V 160R QCSH61- 7R 28Q 40R 44A 70L 8K 19S 44K54L 80 + E01 75N 96E 103T 121E 61R 70S 75N 132V 160R 132V QCSH61- 7R 28Q40R 44A 70L 8K 19S 44K 54L 81 + H01a 75N 96E 103T 105A 61R 70S 75N 121E132V 160R 80K 132V QCSH61- 7E 28Q 40R 44A 70L 8K 19S 44K 54L 83 + H01b75N 96E 103T 105A 61R 70S 75N 121E 132 V160R 80K 132V QCSH61- 7R 28Q 40R44A 70L 8K 19S 44K 54L 84 + H01c 75N 96E 103T 105A 61R 80K 132V 121E132V 160R QCSH61- 7E 28Q 40R 44A 70L 8K 19S 44K 54L 85 + H01d 75N 96E103T 105A 61R 80K 132V 121E 132V 160R QCSH62- 7R 28Q 40R 44A 54L 8K 19S44K 61R 82 + A02 70L 75N 96E 103T 70S 75N 132V 121E 132V 160R

All the variants were active in the described conditions and sharedspecific behaviour upon assayed dose depending on the mutation profilethey bear (FIG. 10). For example, QCSH61-H01a, b, c, d have a similarprofile to our internal standard SCOH-RAG. They are very active moleculeeven at low doses. All of these variants could be used for SH6 genometargeting.

Example 6 Gene Targeting at the Endogenous SH6 Loci in Human Cells

To validate the cleavage activity of engineered single-chain SH6meganucleases, their ability to stimulate homologous recombination atthe endogenous human SH6 loci was evaluated. Cells were transfected withmammalian expression plasmids for single chain molecules SCOH-QCSH6-H01(SEQ ID NO: 81; pCLS3690) or SCOH-QC-SH6-H01-V2-7E-70R75D (SEQ ID NO:85; pCLS4373) and the donor repair plasmid pCLS3779 (FIG. 13; SEQ ID NO:279) containing 2.8 kb of exogenous DNA sequence flanked by twosequences, both 1.5 kb in length, homologous to the human SH6 locus.Cleavage of the native SH6 locus by the meganuclease yields a substratefor homologous recombination, which may use the donor repair plasmidcontaining 2.8 kb of exogenous DNA flanked by homology arms as a repairmatrix. Thus, the frequency with which targeted integration occurs atthe SH6 locus is indicative of the cleavage efficiency of the genomicSH6 target site.

Example 6.1. Materials and Methods

a) Meganuclease Expression Plasmids

The meganucleases used in this example are SCOH-QCSH6-H01 (SEQ ID NO:81) or SCOH-QC-SH6-H01-V2-7E-70R75D (SEQ ID NO: 85) cloned in amammalian expression vector, resulting in plasmid pCLS3690 (FIG. 13) andpCLS4373 respectively.

b) Donor Repair Plasmid

The donor plasmid contains a PCR generated 1517 bp fragment of the SH6locus (position 18437771 to 18439287 on chromosome 21, NC_(—)000021.8)as the left homology arm and a 1571 bp fragment of the SH6 locus(position 18439343 to 18440846 on chromosome 21, NC_(—)000021.8) as theright homology arm. The left and right homology arms were insertedupstream (using an AscI site) and downstream (using a SbfI site),respectively, of an exogenous 2.8 kb DNA fragment containing two CMVpromoters and a neomycin resistance gene. The resulting plasmid ispCLS3779 (FIG. 13; SEQ ID NO: 279).

c) Sh6 Gene Targeting Experiments

Human embryonic kidney 293H cells (Invitrogen) were plated at a densityof 1×10⁶ cells per 10 cm dish in complete medium (DMEM supplemented with2 mM L-glutamine, penicillin (100 UI/ml), streptomycin (100 μg/ml),amphotericin B (Fongizone) (0.25 μg/ml) (Invitrogen-Life Science) and10% FBS). The next day, cells were transfected with Lipofectamine 2000transfection reagent (Invitrogen) according to the supplier's protocol.Briefly, 2 μg of the donor plasmid was co-transfected with 3 μg ofsingle-chain meganuclease expression vectors. After 72 hours ofincubation at 37° C., cells were trypsinized and plated in completemedium at 10 or 100 cells per well in 96-well plates. Alternatively,after 72 hours of incubation at 37° C., cells were trypsinized andplated in complete medium at 300 cells per dish in 10 cm-dishes. After 2weeks of incubation at 37° C., individual clonal cellular colonies werepicked and plated in complete medium in 96-well plates. Once cells were80 to 100% confluent, genomic DNA extraction was performed with theZR-96 genomic DNA kit (Zymo research) according to the supplier'sprotocol.

d) PCR Analysis of Gene Targeting Events

The frequency of gene targeting was determined by PCR on genomic DNAusing the primers SH6 GHGF3: 5′-CAATGGAGTTTTGGAGCCAC-3′ (SEQ ID NO: 280)and NeoR9: 5′-ATCAGAGCAGCCGATTGTCT-3′ (SEQ ID NO: 281). The PCRs resultin a 2300 bp gene targeting specific PCR product (FIG. 14). The SH6GHGF3 primer is a forward primer located upstream of the left homologyarms of the donor repair plasmids. The NeoR9 primer is a reverse primerlocated in the exogenous DNA inserted between the two homology arms ofthe donor repair plasmid.

Example 6.2. Results

Human embryonic kidney 293H cells were co-transfected with 2 vectors: aplasmid expressing one of the two single-chain SH6 meganucleases and thedonor repair plasmid pCLS3779 (FIG. 13; SEQ ID NO: 279). As a controlfor spontaneous recombination, 293H cells were also transfected with thedonor repair plasmid alone. The cells were then plated at 10 or 100cells per well in 96-well microplates or at 300 cells per 10 cm-dishesand 2 weeks later clonal colonies were isolated and plated in 96-wellmicroplates. Genomic DNA derived from these cells was analyzed for genetargeting by PCR as described in Material and Methods. In the absence ofmeganuclease (repair plasmid alone), 5 PCR positive signals weredetected among the 67680 cells analyzed in pools of 10 or 100 cellsindicating a frequency of spontaneous of recombination of 0.007%. Incontrast, in the presence of the SCOH-QCSH6-H01 (SEQ ID NO: 81;pCLS3690) or SCOH-QC-SH6-H01-V2-7E-70R75D meganucleases (SEQ ID NO: 85;pCLS4773), 177 and 35 positives were detected among the 73320 and 18800cells analyzed in pools of 10 or 100 cells indicating a frequency ofrecombination of 0.24% and 0.19% respectively. Results are presented inTable XV. These results demonstrate that the two single chain moleculesSCOH-QCSH6-H01 (SEQ ID NO: 81; pCLS3690) andSCOH-QC-SH6-H01-V2-7E-70R75D (SEQ ID NO: 85; pCLS4773) are capable ofinducing high levels of gene targeting at the endogenous sh6 locus.

TABLE XV Frequency of gene targeting events at the sh6 locus in human293H cells Cells per Gene targeting Meganuclease well PCR+ eventsfrequency SCOH-QCSH6-H01 100 151/65800 0.23% (SEQ ID NO: 81)SCOH-QC-SH6- 100  35/18800 0.19% H01-V2-7E-70R75D (SEQ ID NO: 85) None(with SH6 100  5/56400 0.009%  repair plasmid) SCOH-QCSH6-H01 10 26/75200.35% (SEQ ID NO: 81) None (with SH6 10  0/11280 NA repair plasmid)SCOH-QCSH6-H01 monoclonal 9/650 1.38% (SEQ ID NO: 81) SCOH-QC-SH6-monoclonal 2/116 1.72% H01-V2-7E-70R75D (SEQ ID NO: 85) None (with SH6monoclonal 0/752 NA repair plasmid) NA: not applicable

Example 7 Transgene Expression after Gene Targeting at the EndogenousSh6 Loci in Human Cells

To validate the capacity of sh6 locus to support transgene expression atsh6 locus cleavage activity of engineered single-chain SH6meganucleases, gene targeting experiments were conducted with a repairplasmid containing a neomycin-resistance gene expression cassette andthe ability of modified cells to grow in Neomycin-containing media wasmeasured. The survival and growth of cells in the presence of Neomycinis dependent on the expression of the neomycin-resistance gene and istherefore indicative of transgene expression at the SH6 locus followingtargeted integration.

Example 7.1. Materials and Methods

a) Meganuclease Expression Plasmids

The meganuclease used in this example is SCOH-QCSH6-H01 (SEQ ID NO: 81)cloned in a mammalian expression vector, resulting in plasmid pCLS3690.

b) Donor Repair Plasmid

The donor plasmid contains a PCR generated 1517 bp fragment of the SH6locus (position 18437771 to 18439287 on chromosome 21, NC_(—)000021.8)as the left homology arm and a 1571 bp fragment of the SH6 locus(position 18439343 to 18440846 on chromosome 21, NC_(—)000021.8) as theright homology arm. The left and right homology arms were insertedupstream (using an AscI site) and downstream (using a SbfI site),respectively, of an exogenous 2.8 kb DNA fragment containing two CMVpromoters and a neomycin resistance gene. The resulting plasmid ispCLS3779 (FIG. 13; SEQ ID NO: 279).

c) Sh6 Gene Targeting Experiments

Human embryonic kidney 293H cells (Invitrogen) were plated at a densityof 1×10⁶ cells per 10 cm dish in complete medium (DMEM supplemented with2 mM L-glutamine, penicillin (100 UI/ml), streptomycin (100 μg/ml),amphotericin B (Fongizone) (0.25 μg/ml) (Invitrogen-Life Science) and10% FBS). The next day, cells were transfected with Lipofectamine 2000transfection reagent (Invitrogen) according to the supplier's protocol.Briefly, 2 μg of the donor plasmid was co-transfected with 3 μg ofsingle-chain meganuclease expression vectors. After 72 hours ofincubation at 37° C., cells were trypsinized and plated in completemedium at 300 cells per dish in 10 cm-dishes. After 2 weeks ofincubation at 37° C., individual clonal cellular colonies were pickedand plated in complete medium in 96-well plates. After one week ofincubation at 37° C., cells were trypsined, plated into 2 replicate96-well plates and incubated at 37° C. Once cells were 80 to 100%confluent, genomic DNA extraction was performed on one of the replicateplate with the ZR-96 genomic DNA kit (Zymo research) according to thesupplier's protocol. The other replicate was used to isolategene-targeted clone and expand them.

d) PCR Identification of Gene Targeted Clones

Gene targeting was determined by PCR on genomic DNA using the primersSH6 GHGF3: 5′-CAATGGAGTTTTGGAGCCAC-3′ (SEQ ID NO: 280) and NeoR9:5′-ATCAGAGCAGCCGATTGTCT-3′ (SEQ ID NO: 281). The PCRs result in a 2300bp gene targeting specific PCR product (FIG. 14). The SH6 GHGF3 primeris a forward primer located upstream of the left homology arms of thedonor repair plasmids. The NeoR9 primer is a reverse primer located inthe exogenous DNA inserted between the two homology arms of the donorrepair plasmid.

e) Validation of Targeted Integration by Southern Blot:

Genomic DNA from cellular clones was digested with StuI or HindIIIrestriction enzymes (New England Biolabs), separated by electrophoresison a 0.8% agarose gela and transferred onto a nitrocellulose membrane. ADNA probe was prepared from 25 ng of a DNA fragment homologous to theNeomycin resistance gene with ³²P-radiolabeled dCTP and Rediprime IIrandom prime labelling system (GE Healthcare) according to supplier'sprotocol and added to the nitrocellulose membrane tha had preincubatedin hybridization buffer (NaPi 20 mM, 7% SDS, 1 mM EDTA). After overnightincubation at 65° C., the membrane was washed and exposed to aradiography film. The size of expected bands on the radiograph are 5.3kb for StuI digestion and 6.8 kb for HindIII digestion (FIG. 15).

f) Neomycin-Resistance Test:

Cellular clones identified by PCR as targeted at SH6 locus were platedat 300 cells per well in 96-well microplates in the presence of G418antibiotics (PAA laboratories). After 10 days of incubation at 37° C.,viability was measured using Vialight bioassay kit (Lonza) and a Victorluminescence reader (Perkin Elmer) according to supplier's protocol.

Example 7.2. Results

Human embryonic kidney 293H cells were co-transfected with 2 vectors: aplasmid expressing one of the two single-chain SH6 meganucleases and thedonor repair plasmid pCLS3779. The cells were then plated at 300 cellsper 10-cm dish and 2 weeks later clonal colonies were isolated andplated in 96-well microplates. Genomic DNA derived from these cells wasanalyzed for gene targeting by PCR as described in Material and Methods.Genomic DNA was then used to validate targeted integration by southernblot analysis. The clones number 7 and 8 showed bands of the expectedsize whereas negative control clones number 5 and 6 did not (FIG. 16).Those cellular clones were tested for their ability to survive in thepresence of G418 (PAA laboratories). Only clones with targetedintegration (number 7 and 8) showed resistance to G418 at concentrationssuperior to 0.4 mg/ml (FIG. 16). This indicates that targetedintegration at sh6 locus can support functional transgene expression.

Example 8 Neighboring Gene Expression after Gene Targeting at theEndogenous sh6 Loci in Human Cells

To validate the capacity of sh6 locus to support transgene integrationwithout disturbing the expression of neighboring genes, gene targetingexperiments were conducted with a repair plasmid containing a 2.8 kbexogenous DNA fragment and cellular clones were identified thatcontained the targeted integration. The expression of genes upstream anddownstream of the sh6 integration site was measured and compared to thatof cellular clones that had not undergone targeted integration.

Example 8.1. Materials and Methods

a) Meganuclease Expression Plasmids

The meganucleases used in this example is SCOH-QCSH6-H01 (SEQ ID NO:81)cloned in a mammalian expression vector, resulting in plasmid pCLS3690.

b) Donor Repair Plasmid

The donor plasmid contains a PCR generated 1517 bp fragment of the SH6locus (position 18437771 to 18439287 on chromosome 21, NC_(—)000021.8)as the left homology arm and a 1571 bp fragment of the SH6 locus(position 18439343 to 18440846 on chromosome 21, NC_(—)000021.8) as theright homology arm. The left and right homology arms were insertedupstream (using an AscI site) and downstream (using a SbfI site),respectively, of an exogenous 2.8 kb DNA fragment containing two CMVpromoters and a neomycin resistance gene. The resulting plasmid ispCLS3779 (FIG. 13; SEQ ID NO: 279).

c) Sh6 Gene Targeting Experiments

Human embryonic kidney 293H cells (Invitrogen) were plated at a densityof 1×10⁶ cells per 10 cm dish in complete medium (DMEM supplemented with2 mM L-glutamine, penicillin (100 UI/ml), streptomycin (100 μg/ml),amphotericin B (Fongizone) (0.25 μg/ml) (Invitrogen-Life Science) and10% FBS). The next day, cells were transfected with Lipofectamine 2000transfection reagent (Invitrogen) according to the supplier's protocol.Briefly, 2 μg of the donor plasmid was co-transfected with 3 μg ofsingle-chain meganuclease expression vectors. After 72 hours ofincubation at 37° C., cells were trypsinized and plated in completemedium at 300 cells per dish in 10 cm-dishes. After 2 weeks ofincubation at 37° C., individual clonal cellular colonies were pickedand plated in complete medium in 96-well plates. After one week ofincubation at 37° C., cells were trypsined, plated into 2 replicate96-well plates and incubated at 37° C. Once cells were 80 to 100%confluent, genomic DNA extraction was performed on one of the replicateplate with the ZR-96 genomic DNA kit (Zymo research) according to thesupplier's protocol. The other replicate was used to isolategene-targeted clone and expand them.

d) PCR Identification of Gene Targeted Clones

Gene targeting was determined by PCR on genomic DNA using the primersSH6 GHGF3: 5′-CAATGGAGTTTTGGAGCCAC-3′ (SEQ ID NO: 280) and NeoR9:5′-ATCAGAGCAGCCGATTGTCT-3′ (SEQ ID NO: 281). The PCRs result in a 2300bp gene targeting specific PCR product (Figure XX). The SH6 GHGF3 primer(SEQ ID NO: 280) is a forward primer located upstream of the lefthomology arms of the donor repair plasmids. The NeoR9 primer (SEQ ID NO:281) is a reverse primer located in the exogenous DNA inserted betweenthe two homology arms of the donor repair plasmid.

e) Expression of Genes Upstream and Downstream from Sh6 Locus:

Gene expression was measured by quantitative RT-PCR. RNA was isolatedfrom subconfluent cellular clones using RNeasy RNA isolation kit(Qiagen) according to manufacturer's protocol. 3 μg of RNA was used togenerate cDNA using Superscript III First-strand kit (Invitrogen).Quantitative PCR was performed on 10 ng of cDNA per 12 μl-reaction, induplicate samples, using SYBR® Premix Ex Taq™ DNA Polymerase (Lonza) onStratagene MPX3000 instrument. For each gene, the primers used arelisted in the following table:

SEQ SEQ ID ID Gene Forward primer NO: Reverse primer NO: HPRT5′-GCCAGACTTTGTTGGATTTG-3′ 282 5′-CTCTCATCTTAGGCTTTGTATTTTG-3′ 283 USP255′-CAGAGGACATGATGAAGAATTGA-3′ 284 5′-CTCGATCCTCTCCAGATTCG-3′ 285 NRIP15′-GCACTGTGGTCAGACTGCAT-3′ 286 5′-TTCCATCGCAATCAGAGAGA-3′ 287 CXADR5′-CTTATCATCTTTTGCTGTCG -3′ 288 5′-TACTGCCGATGTAGCTTCTG-3′ 289 BTG35′-CCAGAAAAACCATCGAAAGG -3′ 290 5′-GGTCACTATACAAGATGCAGC-3′ 291C21orf91  5′-AAACACTCTCCTTCTGCCACA-3′ 292 5′-ATGGCCCCTTAATGATTTGG-3′ 293

The threshold cycles (Ct) were determined with Stratagene software onfluorescence (dRn) after normalization by the ROX reference dye. Theintensity of gene expression was calculated using the formula2^(Ct(HPRT)-Ct(Gene)), the expression of the housekeeping gene HPRTbeing used as an internal normalizing factor.

Example 8.2. Results

Human embryonic kidney 293H cells were co-transfected with 2 vectors: aplasmid expressing one of the three single-chain SH6 meganucleases andthe donor repair plasmid pCLS3779. The cells were then plated at 300cells per 10-cm dish and 2 weeks later clonal colonies were isolated andplated in 96-well microplates. Genomic DNA derived from these cells wasanalyzed for gene targeting by PCR as described in Material and Methods.RNA was isolated from clones showing targeted integration and negativecontrols. Quantitative RT-PCR was performed to measure expression ofgenes surrounding the locus of targeted integration. The data arepresented in FIG. 17 where the average intensity of duplicate samples isshown for 3 individual targeted clones (KI) and 3 individualnon-targeted clones (WT) after normalization with the housekeeping geneHPRT. No significant difference is observed for each of the 5 genesmeasured, indicating that targeted integration at the sh6 locus has noconsequence on the expression of neighboring genes.

Example 9 Mutagenesis at Endogenous Safe Harbor Loci in Human Cells

To validate the cleavage activity of engineered single-chain Safe Harbormeganucleases, their ability to stimulate mutagenesis at endogenoushuman safe harbor loci was evaluated. Cells were transfected withmammalian expression plasmids for single chain molecules. Cleavage of anative safe harbor locus by the meganuclease yields a substrate fornon-homologous end joining, which is an error-prone process and canresult in small insertion or deletions at the meganuclease target site.Thus, the frequency at which mutations occur at an endogenous safeharbor locus is indicative of the cleavage efficiency of the genomictarget site by the meganuclease.

Example 9.1. Materials and Methods

a) Meganuclease Expression Plasmids

The coding sequences for the meganucleases used in this example werecloned in a mammalian expression vector, resulting in the plasmidslisted in table XVI.

TABLE XVI Meganucleases targeting safe harbour sequences locus targetedmeganuclease plasmid SEQ ID NO sh3 SCOH-SH3-b1-C pCLS2697 31 sh4SCOH-SH4-b1-C pCLS2705 39 sh6 QCSH61-H01 pCLS3690 81 sh6 QC-SH6-pCLS4373 85 H01_V2_7E_70R75D sh6 QC-SH6-H01_7E pCLS4377 83 sh6SCOH-SH6-b12-G2_BQY pCLS6567 294 sh6 SCOH-SH6-b11-G2.2_BQY pCLS6570 295sh8 SCOH-SH8 pCLS3894 88 sh13 SCOH-SH13 pCLS3897 90 sh18SCOH-SH18-b11-C.2 pCLS5519 128 sh19 SCOH-SH19 pCLS3899 91 sh31SCOH-SH31.2 pCLS4076 132 sh39 SCOH-SH39-b11-C pCLS6038 133 sh41SCOH-SH41-b11-C pCLS5187 135 sh42 SCOH-SH42-b11-C pCLS5549 137 sh43SCOH-SH43-b12-C pCLS5595 140 sh44 SCOH-SH44-b11-C pCLS5868 141 sh52SCOH-SH52-b12-C pCLS5871 144

b) Safe Harbor Locus Mutagenesis Experiments

Human embryonic kidney 293H cells (Invitrogen) were plated at a densityof 1×10⁶ cells per 10 cm dish in complete medium (DMEM supplemented with2 mM L-glutamine, penicillin (100 UI/ml), streptomycin (100 μg/ml),amphotericin B (Fongizone) (0.25 μg/ml) (Invitrogen-Life Science) and10% FBS). The next day, cells were transfected with 3 μg of single-chainmeganuclease expression vector using Lipofectamine 2000 transfectionreagent (Invitrogen) according to the supplier's protocol. After 2 to 6days of incubation at 37° C., cells were trypsinized and genomic DNAextraction was performed with the DNeasy blood and tissue kit (Qiagen)according to the supplier's protocol.

c) Deep Sequencing Analysis of Mutagenesis Events

The frequency of mutagenesis was determined by deep sequencing analysis.Oligonucleotides were designed for PCR amplification of a DNA fragmentsurrounding each safe harbour target and are listed in table XVII.

TABLE XVII PCR primers for mutagenesis analysis of safe harbour targetslocus SEQ ID SEQ ID targeted forward primer NO reverse primer NO sh35′-TGGGGGTCTTACTCTGTTTC 296 5′-AGGAGAGTCCTTCTTTGGCCAA 297 CCAG-3′ T-3′sh4 5′-GAGTGATAGCATAATGAAAA 298 5′-CTCACCATAAGTCAACTGTCTCA  299 CCCA-3′G-3′ sh6 5′-TCTTTGTGTTTCCAAAGAGT  300 5′-GAATGGTCTGAAAATGGAGAGG 301TCCTTTGGCTTTCAC-3′ TTAAATGAGATTT-3′ sh8 5′-ACTAAATATGTTAATTGTGT  3025′-ATTGCTACTTCATTTGTTATGTT  303 GTATACAGTTTTTGT-3′ AACTATGACATG-3′ sh135′-TTTTTGTGGGTCCACAGTAG 304 5′-CAGTTGAACTCATGGATGTAGA 305GTGTATATATTTATGG-3′ GAGTAGAAGAATG-3′ sh18 5′-GACCTGAAGCTCAGGTACT 3065′-AGTGGTGGTAGGCAGGACAT-3′ 307 T-3′ sh19 5′-CTTAGGTAAACCTCAAAACA 3085′-CTGCTAGAGCCCGTAATGTTTCA 309 ACAAGAGAGGAGCAA-3′ ATCATAGTTATT-3′ sh315′-TTCAGGTTAGGTGACCTTCA 310 5′-AAGACCAGGCTGGGCAACCATAG 311 AACT-3′ C-3′sh39 5′-GAATAATGGAATAAACCCAG 312 5′-GTGTTCAAGGAAAATGGAGTGA 313AGAGAAACAGAG-3′ TATTAGGAAT-3′ sh41 5′-GGAGATATCATTAAAAGAGG 3145′-ATTACAATAGCCTTAGGAAACTA  315 CATT-3′ G-3′ sh425′-GAGTCACAGCCACCTTACAT 316 5′-AAGTAGAACACATTCCTATTTCC  317TTTACTTTTC-3′ ATTAAGT-3′ sh43  5′-ATTAAGTACAAAATTTGGTCC 3185′-AAAGTTGATTCATCTGAAACAT 319 AAT-3′ G-3′ sh44  5′-GCAGCGATCCATGGTGGAG320 5′-TAACACAGGCTCATGTAGGT-3′ 321 A-3′ sh52   5′-ATGTTATTCGAGGACCCACT-322 5′-GTGACAACTCTGCTAGAAGA-3′ 323 3′

Nucleotides were added to obtain a fragment flanked by specificadaptator sequences (5′-CCATCTCATCCCTGCGTGTCTCCGACTCAG-3′; SEQ ID NO324) and (5′-CCTATCCCCTGTGTGCCTTGGCAGTCTCAG-3′; SEQ ID NO 325) providedby the company offering sequencing service (GATC Biotech AG, Germany) onthe 454 sequencing system (454 Life Sciences). An average of 18,000sequences was obtained from pools of 2 to 3 amplicons (500 ng each).After sequencing, different samples were identified based on barcodesequences introduced in the first of the above adaptators.

Example 9.2. Results

Human embryonic kidney 293H cells were transfected with a plasmidexpressing a single-chain safe harbor meganuclease. After 2 to 6 days ofincubation at 37° C., genomic DNA was isolated and PCR was used toamplify the genomic sequence surrounding the meganuclease target site.Sequences were then analyzed for the presence of insertions or deletionsevents (InDel) in the cleavage site of each safe harbor target. Resultsare summarized in table XVIII.

TABLE XVIII Mutagenesis by meganucleases targeting safe harbor loci:locus Cleaved by meganucleases targeted of SEQ ID NO: Plasmids % InDelssh3 31 2697 0.8 sh4 39 2705 0.2 sh6 81 3690 0.6 85 4373 3.5 83 4377 1.5294 6567 1 295 6570 3 sh8 88 3894 0.5 sh13 90 3897 1.5 sh18 128 5519 1.2sh19 91 3899 0.9 sh31 132 4076 5 sh39 133 6038 1.5 sh41 135 5187 0.4sh42 137 5549 0.7 sh43 140 5595 0.4 sh44 141 5868 3.6 sh52 144 5871 3.2

Example 10 Conclusion

In conclusion, Examples 1, 2, 3 and 5 demonstrate that both I-CreIheterodimeric proteins and single-chain meganucleases capable ofcleaving the SH3, the SH4 and the SH6 loci can be obtained. Moreover,these endonucleases are capable of cleaving these loci with a strongcleavage activity.

Example 4 demonstrates that single-chain meganucleases capable ofcleaving the SH3 and the SH4 loci allow efficiently inserting atransgene into a target site of a human cell.

These endonucleases can thus advantageously be used to insert atransgene into the SH3, the SH4 loci or the SH6 loci of an individual.

Example 6 demonstrates that at least two single chain moleculesaccording to the invention are capable of inducing high levels of genetargeting at an endogenous sh6 locus.

Example 7 demonstrates that targeted integration a locus can supportfunctional transgene expression.

Example 8 demonstrates that a targeted integration at a locus does notsubstantially modify expression of five genes located in the vicinity ofthe target sequence.

Example 9 demonstrates mutagenesis frequencies for differentmeganucleases targeting safe harbor sequences, which are indicative ofthe cleavage efficiency of the genomic target site by saidmeganucleases.

1. A variant endonuclease capable of cleaving a target sequence for use in inserting a transgene into a the genome of an individual, wherein i. said genome comprises a locus comprising said target sequence; and ii. said target sequence is located at a distance of at most 200 kb from a retroviral insertion site (RIS), wherein said RIS is neither associated with cancer nor with abnormal cell proliferation.
 2. The endonuclease according to claim 1, wherein insertion of said transgene does not substantially modify expression of genes located in the vicinity of the target sequence.
 3. The endonuclease according to claim 1, wherein said target sequence is located at a distance of at least 100 kb from the nearest genes.
 4. The endonuclease according to claim 1, wherein said endonuclease is a homing endonuclease.
 5. The endonuclease according to claim 1, wherein said endonuclease is capable of cleaving a target sequence located within a locus selected from the group consisting of the SH6 locus on human chromosome 21q21.1, the SH3 locus on human chromosome 6p25.1, the SH4 locus on human chromosome 7q31.2, the SH12 locus on human chromosome 13q34, the SH13 locus on human chromosome 3p12.2, the SH19 locus on human chromosome 22, the SH20 locus on human chromosome 12q21.2, the SH21 locus on human chromosome 3p24.1, the SH33 locus on human chromosome 6p12.2, the SH7 locus on human chromosome 2p16.1, the SH8 locus on human chromosome 5, the SH18 locus, the SH31 locus, the SH38 locus, the SH39 locus, the SH41 locus, the SH42 locus, the SH43 locus, the SH44 locus, the SH45 locus, the SH46 locus, the SH47 locus, the SH48 locus, the SH49 locus, the SH50 locus, the SH51 locus, the SH52 locus, the SH70 locus, the SH71 locus, the SH72 locus, the SH73 locus, the SH74 locus, the SH75 locus, the SH101 locus, the SH106 locus, the SH107 locus, the SH102 locus, the SH105 locus, the SH103 locus, the SH104 locus, the SH113 locus, the SH109 locus, the SH112 locus, the SH108 locus, the SH110 locus, the SH114 locus, the SH116 locus, the SH111 locus, the SH115 locus, the SH121 locus, the SH120 locus, the SH122 locus, the SH117 locus, the SH118 locus, the SH119 locus, the SH123 locus, the SH126 locus, the SH128 locus, the SH129 locus, the SH124 locus, the SH131 locus, the SH125 locus, the SH127 locus, the SH130 locus, the SH11 locus, the SH17 locus, the SH23 locus, the SH34 locus, the SH40 locus, the SH53 locus, the SH54 locus, the SH55 locus, the SH56 locus, the SH57 locus, the SH58 locus, the SH59 locus, the SH60 locus, the SH61 locus, the SH62 locus, the SH65 locus, the SH67 locus, the SH68 locus and the SH69 locus.
 6. A variant dimeric I-CreI protein comprising two monomers that each comprises a sequence at least 80% identical to SEQ ID NO: 1 or SEQ ID NO: 42, wherein: i. said dimeric I-CreI protein is capable of cleaving a target sequence located within a locus of an individual, said target sequence being located at a distance of at most 200 kb from a retroviral insertion site (RIS), and said RIS being neither associated with cancer nor with abnormal cell proliferation; and ii. said target sequence does not comprise a sequence of SEQ ID NO:
 4. 7. The dimeric I-CreI protein according to claim 6, wherein said dimeric I-CreI protein is capable of cleaving a target sequence located within a locus selected from the group consisting of the SH6 locus on human chromosome 21q21.1, the SH3 locus on human chromosome 6p25.1, the SH4 locus on human chromosome 7q31.2, the SH12 locus on human chromosome 13q34, the SH13 locus on human chromosome 3p12.2, the SH19 locus on human chromosome 22, the SH20 locus on human chromosome 12q21.2, the SH21 locus on human chromosome 3p24.1, the SH33 locus on human chromosome 6p12.2, the SH7 locus on human chromosome 2p16.1, the SH8 locus on human chromosome 5, the SH18 locus, the SH31 locus, the SH38 locus, the SH39 locus, the SH41 locus, the SH42 locus, the SH43 locus, the SH44 locus, the SH45 locus, the SH46 locus, the SH47 locus, the SH48 locus, the SH49 locus, the SH50 locus, the SH51 locus, the SH52 locus, the SH70 locus, the SH71 locus, the SH72 locus, the SH73 locus, the SH74 locus, the SH75 locus, the SH101 locus, the SH106 locus, the SH107 locus, the SH102 locus, the SH105 locus, the SH103 locus, the SH104 locus, the SH113 locus, the SH109 locus, the SH112 locus, the SH108 locus, the SH110 locus, the SH114 locus, the SH116 locus, the SH111 locus, the SH115 locus, the SH121 locus, the SH120 locus, the SH122 locus, the SH117 locus, the SH118 locus, the SH119 locus, the SH123 locus, the SH126 locus, the SH128 locus, the SH129 locus, the SH124 locus, the SH131 locus, the SH125 locus, the SH127 locus, the SH130 locus, the SH11 locus, the SH17 locus, the SH23 locus, the SH34 locus, the SH40 locus, the SH53 locus, the SH54 locus, the SH55 locus, the SH56 locus, the SH57 locus, the SH58 locus, the SH59 locus, the SH60 locus, the SH61 locus, the SH62 locus, the SH65 locus, the SH67 locus, the SH68 locus and the SH69 locus.
 8. The dimeric I-CreI protein according to claim 6, wherein said dimeric I-CreI protein is capable of cleaving a target sequence located within the SH6 locus on human chromosome 21 q21.1,
 9. The dimeric I-CreI protein according to claim 8, wherein said target sequence comprises the sequence of SEQ ID NO:
 59. 10. A fusion protein comprising the monomers of the dimeric I-CreI protein as defined in claim
 6. 11. The fusion protein according to claim 10, wherein said fusion protein comprises a sequence selected from the group consisting of SEQ ID Nos. 81, 82-85, 294, 295, 76-80, 25-40, 86-96, 127-150, 182-213, 235-270 and 275-278.
 12. A nucleic acid encoding: a) a variant endonuclease capable of cleaving a target sequence for use in inserting a transgene into a genome of an individual, wherein: i) said genome comprises a locus comprising said target sequence; and ii) said target sequence is located at a distance of at most 200 kb from a retroviral insertion site (RIS), wherein said RIS is neither associated with cancer nor with abnormal cell proliferation; or b) a variant dimeric I-CreI protein according to claim
 6. 13. An expression vector comprising the nucleic acid as defined in claim
 12. 14. The expression vector according to claim 13, further comprising a targeting construct comprising a transgene and two sequences homologous to the genomic sequence flanking a target sequence recognized by the endonuclease.
 15. A combination of: an expression vector as defined in claim 13; and a vector comprising a targeting construct comprising a transgene and two sequences homologous to the genomic sequence of a target sequence recognized by the endonuclease.
 16. A pharmaceutical composition comprising the expression vector as defined in claim 14, and a pharmaceutically acceptable carrier.
 17. A method of inserting a transgene into a genome of a cell, tissue or non-human animal comprising administering to said cell, tissue or non-human animal an endonuclease according to claim
 1. 18. A method of making a non-human animal model of a hereditary disorder comprising the method of claim
 17. 19. A method of producing a recombinant protein comprising the method of claim
 17. 20. A method for obtaining an endonuclease suitable for inserting a transgene into the genome of an individual, comprising the step of: a) selecting, within the genome of said individual, a retroviral insertion site (RIS) that is neither associated with cancer nor with abnormal cell proliferation; b) defining a genomic region extending 200 kb upstream and 200 kb downstream of said RIS; and c) identifying a wild-type endonuclease or constructing a variant endonuclease capable of cleaving a target sequence located within said genomic region.
 21. A pharmaceutical composition comprising the combination as defined in claim 15 and a pharmaceutically acceptable carrier.
 22. A method of inserting a transgene into a genome of a cell, tissue or non-human animal comprising administering to said cell, tissue or non-human animal a variant according to claim
 6. 23. A method of inserting a transgene into a genome of a cell, tissue or non-human animal comprising administering to said cell, tissue or non-human animal a nucleic acid according to claim
 12. 24. A method of inserting a transgene into a genome of a cell, tissue or non-human animal comprising administering to said cell, tissue or non-human animal an expression vector according to claim
 13. 25. A method of inserting a transgene into a genome of a cell, tissue or non-human animal comprising administering to said cell, tissue or non-human animal an expression vector according to claim
 14. 26. A method of inserting a transgene into a genome of a cell, tissue or non-human animal comprising administering to said cell, tissue or non-human animal a combination according to claim
 15. 27. A method of making a non-human animal model of a hereditary disorder comprising the method of claim
 22. 28. A method of making a non-human animal model of a hereditary disorder comprising the method of claim
 23. 29. A method of making a non-human animal model of a hereditary disorder comprising the method of claim
 24. 30. A method of making a non-human animal model of a hereditary disorder comprising the method of claim
 25. 31. A method of making a non-human animal model of a hereditary disorder comprising the method of claim
 26. 32. A method of producing a recombinant protein comprising the method of claim
 22. 33. A method of producing a recombinant protein comprising the method of claim
 23. 34. A method of producing a recombinant protein comprising the method of claim
 24. 35. A method of producing a recombinant protein comprising the method of claim
 25. 36. A method of producing a recombinant protein comprising the method of claim
 26. 